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Detection of Obstacles in Monocular Image Sequences 


Preface 


This research was initiated as a part of the Synthetic Vision System (SVS) project for the 
development of a High Speed Civil Transport (HSCT) aircraft at NASA Ames Research 
Center. Primary goal of the research reported here is to develop image analysis algorithms 
to detect the runways/taxiways and obstacles in monocular image sequences obtained 
from two types of sensors, a Passive Millimeter Wave (PMMW) sensor and a video 
camera. The report is divided into two parts. The first part of this report aims at 
developing a model-based approach for detecting runways/taxiways and objects on the 
runway from a sequence of images obtained from a moving PMMW sensor. An 
approximate runway model and the position information of the camera provided by the 
Global Positioning System (GPS) is used to define the region of interest to search for 
image features corresponding to the runway markers. Once the runway is identified, a 
histogram-based thresholding is used to detect obstacles on and outside the runway. The 
second part addresses the problem of detection of objects in monocular image sequences 
obtained from an on-board video camera. A recursive motion-based segmentation 
algorithm based on planar motion model is used. The background motion due to camera is 
identified as the dominant motion and the outliers which do not follow this motion are 
identified as obstacles. 
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Abstract 


The ability to detect and locate runways/taxiways and obstacles in images captured using 
on-board sensors is an essential first step in the automation of low-altitude flight, landing, 
takeoff, and taxiing phase of aircraft navigation. Automation of these functions under 
different weather and lighting situations, can be facilitated by using sensors of different 
modalities. An aircraft- based Synthetic Vision System (SVS), with sensors of different 
modalities mounted on-board, complements the current ground-based systems in functions 
such as detection and prevention of potential runway collisions, airport surface navigation, 
and landing and takeoff in all weather conditions. 

In this report, we address the problem of detection of objects in monocular image 
sequences obtained from two types of sensors, a Passive Millimeter Wave (PMMW) 
sensor and a video camera mounted on-board a landing aircraft. Since the sensors differ in 
their spatial resolution, and the quality of the images obtained using these sensors is not 
the same, different approaches are used for detecting obstacles depending on the sensor 
type. These approaches are described separately in two parts of this report. 

The goal of the first part of the report is to develop a method for detecting 
runways/taxiways and objects on the runway in a sequence of images obtained from a 
moving PMMW sensor. Since the sensor resolution is low and the image quality is very 
poor, we propose a model-based approach for detecting runways/taxiways. We use the 
approximate runway model and the position information of the camera provided by the 
Global Positioning System (GPS) to define regions of interest in the image plane to search 
for the image features corresponding to the runway markers. Once the runway region is 
identified, we use histogram-based thresholding to detect obstacles on the runway and 
regions outside the runway. This algorithm is tested using image sequences simulated from 
a single real PMMW image. 

The camera position information provided by the GPS is not accurate. An 
alternative is to estimate the camera position using image-based features, such as points 
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and lines. The accuracy of such estimates, however, depends on the resolution of the 
sensor and the position of the sensor in 3-D. We propose an analytical model to compute 
the accuracy of the estimated camera position and develop equations for accuracy in terms 
of sensor resolution and the sensor position in 3-D. Analytical results obtained using this 
model for three different sensors at six different positions of the sensor are presented. We 
also propose a new algorithmic approach for computing the error in the estimated camera 
pose due to image plane quantization. Such theoretical analysis is useful in deciding the 
required camera resolution for use in the design of SVS for navigation. 

The second part of this report addresses the problem of detection of objects in 
monocular image sequences obtained from a video camera mounted on-board a landing 
aircraft. First, the optical flow is computed at comer-like feature points detected on the 
image using a comer detector. We assume that the runway is either planar or is piecewise 
planar and propose a recursive motion-based segmentation algorithm based on the planar 
motion model for segmenting flow vectors into separate regions. Model parameters are 
recovered for each of the region. The background motion due to camera is identified as 
the dominant motion and the outliers are identified as due to the obstacles. Various stages 
of the algorithm were tested using both synthetic and real image sequences obtained from 
actual flight test. 

Line features are considered to be prominent and more reliable than point features. 
In this report we have explored the use of line features for two purposes: (1) estimating 
the runway plane parameter using runway markers, and (2) tracking and estimating the 
position and velocity of the ground-based obstacles using object lines. The algorithm was 
tested using real image sequences. Since the angle between the projecting planes (i.e., the 
plane containing the runway marker and the corresponding line segment in the image 
plane) for a given line segment in two frames was too small to resolve, good estimate for 
the plane parameter could not be obtained. Though line features were tracked, end points 
could not be properly located and hence good estimate for the motion of the object could 
not be obtained. 
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1 Introduction 


The ability to detect and locate runways/taxiways and obstacles in images captured using 
on-board sensors is an essential first step in the automation of low altitude flight, landing, 
takeoff and taxiing phase of navigation of aircrafts. At present the autopilot system nav- 
igates the aircraft up to a certain height (depending on the autopilot system installed on 
the aircraft and the capability of the ground systems such as Instrument Landing System 
(ILS), Microwave Landing System (MLS) etc.) above the runway during landing after which 
the human pilot assumes control of the aircraft. At this point, the pilot makes decisions 
about landing depending on the external visibility. The landing is often aborted or delayed 
if the visibility is poor. Crew members rely on a sophisticated combination and fusion of 
information from multiple sensors in order to accomplish their mission under conditions of 
darkness, reduced visibility and bad weather. Hence, there is a strong need for an aircraft- 
based Synthetic Vision System (SVS) to complement the ground-based systems in various 
functions such as detection and prevention of potential runway collisions, airport surface 
navigation, and landing and takeoff in all weather conditions [17, 107]. 

Implementing an SVS would upgrade the capabilities of over 1100 US airports to CAT 
Ilia 1 or better, which are currently equipped with CAT II 2 or CAT I 3 capability [24, 107]. 

Another motivating factor for the development of a SVS is the design of a supersonic 


x The minimum vertical visibility ceiling is 12-35 feet with runway visual range of 100-300 meters. A fully 
automatic landing system with automatic flare and a failure survival autopilot system with a probability of 
catastrophic failure less than 10“ 7 per hour is required. Pilot assumes control at touchdown. 

2 The minimum required vertical visibility ceiling is 100 feet and runway visual range is 400 meters. A 
fail passive autopilot is required. Pilot takes over landing at a height of 100 feet. 

3 There is sufficient vertical visibility at a height of 200 feet with a runway visual range of at least 
800 meters for the pilot to carry out a safe landing under manual control. A simplex autopilot system is 
acceptable. Pilot assumes control at a height of 200 feet. 
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commercial aircraft. It is argued that a High Speed Civil Transport (HSCT) without 
a Concorde-like drooped nose could result in significant reduction in the gross takeoff 
weight[107, 116]. Without a drooped nose, the cockpit has limited external visibility and 
the pilot has to depend on an S VS which derives information from combination of sensors. 

The SVS is envisioned to be equipped with a Passive Milli-Meter Wave (PMMW) sensor 
in addition to the infra-red and video cameras. A PMMW sensor is designed to be operated 
at lower frequencies (e.g., 94 GHz), at which point the energy attenuation due to fog is known 
to be at a minimum, thus providing the ability to see through fog [132]. Outputs of these 
sensors will be integrated with aircraft position and orientation information (yaw, pitch, and 
roll) provided by the Global Positioning System (GPS) and the Inertial Navigation System 
(INS), and an airport geometry database to detect the runways/ taxi ways and obstacles, 
locate the aircraft within the airport, determine potential conflicts, issue advisories, and 
sound cockpit alarms. 

In this research, we address the problem of detection of ground-based obstacles in monoc- 
ular image sequences obtained from on-board sensors for use in an SVS. Any object in the 
sensor’s field of view is considered to be an obstacle. Most of the algorithms available in the 
literature for obstacle detection axe developed and evaluated for applications in robotics, 
road vehicle navigation, and intelligent vehicle and highway systems (IVHS). These appli- 
cations use high resolution and high quality images obtained using a video camera. Stereo 
images axe often used to estimate the distance to an obstacle. PMMW sensor images, how- 
ever, are of low resolution and poor quality. The use of stereo is not currently recommended 
for SVS due to the difficulty with having such a system with a large enough base distance 
to estimate the 3D position accurately. Hence, there is a strong need to design and evaluate 
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computer vision algorithms for use in SVS for obstacle detection using monocular image 
sequences. 

1.1 Overview 

This report is about detection of objects in monocular image sequences obtained from a 
moving sensor. The initial goal of this research was to design a computer vision system to 
analyze image sequences obtained from a Passive Millimeter Wave imaging system mounted 
on-board the aircraft to detect runways/taxiways and objects on the runway. Although 
PMMW sensors have good response in fog, their spatial resolution and radiometric contrast 
are very low, and the quality of the images obtained using these sensors is poor. However, 
we have additional data in the form of the airport model and approximate position and 
orientation of aircraft, to guide our system to locate objects in the low resolution image. 

The position of the aircraft is available from the GPS, and the orientation of the aircraft 
is provided by the INS. The data from these instruments are known only up to a certain 
accuracy. For example, GPS data are updated once every second and it is likely that a few 
such updates could be missed, potentially causing camera position data to be as much as 
a few hundred feet off. Knowledge of the camera motion information from the GPS/INS 
is the key to detecting objects and estimating their position and velocity. Any inaccuracy 
in this information could result in an error in the estimated position and velocity of the 
objects. 

An alternate approach to obtain an improved estimate of camera position is to use 
objects with known world coordinates and their position in the image plane (e.g., inter- 
sections of runway /taxi ways, corners of buildings, etc.). This requires an analytical study 
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of the relationship among the camera parameters, the resolution of the images, and the 
distances between the aircraft and objects. We propose an analytical model to compute the 
accuracy of the estimated camera position and develop equations for accuracy in terms of 
sensor resolution and sensor position in 3-D. We also propose a new algorithmic approach 
for computing error in the estimated camera pose due to plane quantization. Such theoret- 
ical analysis are useful in deciding the required camera resolution for use in the design of 
SVS for navigation. 

Initial experiments were conducted using a test image obtained from a single pixel 
camera located at a fixed point in space and then mechanically scanning it to obtain a 
50 x 150 pixel image. Using the camera position information and the airport model, a 
sequence of 30 frames was simulated for this experiment. Although results obtained using 
our algorithms on these images were encouraging, further research using PMMW sensor 
could not be continued for the following reasons: 

• A practical camera with an array of pixels could not be developed, 

• It is not clear at this time whether such a sensor could be safely mounted on-board 
an aircraft, and 

• Camera pose estimation from image-based features using low resolution PMMW sen- 
sor images was not better than the position information provided by the GPS. 

Hence, the remaining part of this research is focused on detecting independently moving 
objects in monocular images obtained by a moving video camera. In this work, we assume 
that the objects were moving on a background which is either planar or is piecewise planar. 
We describe a new recursive motion-based segmentation algorithm for segmenting images 
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into regions corresponding to independently moving objects and estimating their motion 
parameters. 

Apparent motion of brightness patterns in images, called the optical flow, is computed 
first. Since the camera is moving, background (i.e., the runway) appears to be moving in 
the image. Using knowledge of camera motion, flow vectors resulting from the runway are 
grouped based on the hypothesis that they are resulting from a single planar surface in 
motion Planar surface motion model parameters for the runway is computed using a least 
square model fitting approach. Flow vectors violating the planar motion model are detected 
as outliers and are identified as obstacles. 

Line features are considered to be prominent and more reliable than point features. Two 
types of line features are detected in image sequences obtained from a camera mounted on- 
board a landing aircraft - - line features due to the runway markers and line features due 
to the 3-D objects in the scene. Runway plane parameters are computed by matching at 
least two line features in two frames. Using the known camera motion information and 
the estimated plane parameters, line features in the image corresponding to the 3-D object 
in motion are tracked. A recursive Kalman filter-based approach is used to track the line 
features and estimate the position and velocity of the object in 3-D assuming that the object 
is moving on the planar runway. 

This research addresses many difficult problems in computer vision, such as motion- 
based segmentation, feature tracking, dynamic scene analysis, etc. In the past, researchers 
solved the problem of motion-based segmentation using the Hough method, image regis- 
tration using digital warping and other statistical methods. The digital warping approach 
requires knowledge of the warping parameters. It is mainly used to compensate for the 
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dominant motion; all features violating the dominant motion are detected as outliers or 
obstacles. This results in warping noise in addition to the digitization noise. The Hough 
method can be used to segment and estimate multiple motions simultaneously, although it is 
computationally expensive and its success and accuracy are mainly dependent on knowledge 
of the range and resolution of the parameter space. 

Unlike the Hough-based approach, our approach does not require prior knowledge of 
the range of parameter values for the motion model. Instead, parameter values are directly 
estimated. This estimation is refined by removing the outliers at every iteration. Using our 
algorithm, basic segmentation can be done without knowledge of the motion parameters, 
as opposed to the warping technique, which requires the camera motion parameters to 
compute the warping matrix. To deal with the computation of multiple motions, we propose 
incorporation of split and merge technique to the motion-based segmentation algorithm. 

The algorithms developed in this work have been tested using both real and synthetic 
images. The synthetic images were generated by the program developed for this report, 
as well as a simulation software developed at the NASA Ames Research Center. Several 
real image sequences used in this work were obtained by sensors mounted on-board a land- 
ing aircraft, and were provided by the NASA Ames Research Center and NASA Langley 
Research Center. 

1.2 Review of the Literature on Obstacle Detection Systems for Naviga- 
tion 

In recent years, considerable effort has been put into evaluating the feasibility of computer 
vision algorithms for these tasks. This section gives a brief overview of research work done 
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at NASA and other institutions in the area of obstacle detection for navigation. 

The detection of runway lines has been attempted in [55, 77] using properties specific 
to the runway (e.g., the use of parallel lines). Work described in [30] is based on a model of 
the runway where the camera image is compared with the expected runway image obtained 
by generating an image using the known position of the camera and the airport runway 
model database. On-board INS and GPS can provide reasonably good position estimation 
during flight, but they are not considered accurate enough to permit automatic landing 
[101]. Image-based features such as points, lines, etc., axe used to determine the position of 
a sensor in [4, 61, 70]. A number of authors have attempted to determine the position of a 
sensor from the captured image [4, 61, 69]. 

Most approaches have been based on some model of environment that is expected by the 
sensor [5, 60, 73]. The position of the sensor is computed from the actual sensor input and 
some perceived difference between the actual and the expected sensor input. The choice of 
a reasonable model for generating the runway scene, as well as the choice of a cost function 
defining the goodness of fit, are crucial to the accuracy and speed of convergence of such an 
algorithm. In [102] three different models - - namely edge-based model, area-based model, 
and texture-based model - - are used to compare the viewed scene with an expected scene. 
A cost function quantifying the matching of the expected image with the camera image is 
minimized to obtain more accurate camera position. 

In principle, the problem of obstacle detection with a moving camera can be completely 
solved if either the range map is supplied for all points in front of the camera, or the 
image acquired using the sensor is segmented and recognized into constituent parts (i.e., 
sky, runways, buildings, etc.). A Kalman filter-based recursive estimation procedure for 
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estimating range to feature points in the image is explained in [95, 108, 110, 111]. Image 
regions of size 11x11 pixels with high variance are detected as features. The initial range 
of these features is computed using motion stereo. The feature positions in the subsequent 
frames are predicted, and the estimates of ranges to the feature points are refined using a 
Kalman filter. The algorithm is tested and the results are reported for both indoor images 
and images acquired from the actual flight test. In this work, all obstacles are assumed to 
be stationary. The algorithm was modified to handle stereo images in [97]. Motion-based 
and stereo-based algorithms for passive range estimation are compared in [109]. 

The range estimation procedure described in [118] uses a simple incremental weighted 
least squares method for estimating the position of stationary objects using known camera 
state parameters. This algorithm extends the epipolar plane image analysis described in 
[14, 8] to general camera motion by assuming the camera motion to be piecewise linear. In 
[120] an optical flow-based approach is used for detecting independently moving objects. 
Image regions corresponding to the independently moving objects are segmented from the 
background by applying the constraint filtering originally proposed by Nelson [87] on the 
optical flow computed from the initial few frames of the sequence. These detected regions 
are tracked over subsequent frames using a model-based tracking algorithm described in 
[56]. Position and velocity of the moving objects in the world coordinate is estimated using 
an extended Kalman filter. 

The obstacle detection algorithm described in [115, 44] assumes that the obstacle-free 
runway is a planar surface and computes the residual flow by compensating for the camera 
motion using image warping. The algorithm is tested using images obtained during a 
helicopter flight. 


8 



1.3 Organization 


This section provided a brief introduction to the research problem and objectives addressed 
in this report. Even though this report is based on the main theme of object detection 
in monocular image sequences, the total research dealt with processing images obtained 
from two different sensors. Detecting obstacles in images obtained from sensors of different 
modalities require different processing techniques. The next two sections explain our work 
using PMMW sensor images. Subsequent sections describe the work done using video image 
sequences. 

Section 2 describes the analytical model for computing the error in the position and 
orientation of the camera estimated using image-based features. An analytical model is 
described for an on-board sensor in terms of its position and orientation, and other sensor 
internal parameters. Equations for error in camera ^parameters axe also derived. Three 
sensors - - the Passive Millimeter Wave (PMMW) sensor, the Forward Looking Infrared 
(FLIR) sensor, and the High Definition Television (HDTV) sensor - - are evaluated and 
quantitative results are presented. 

A model-based approach for detecting objects in a PMMW sensor image is described 
in Section 3. Results were obtained using an image sequence obtained from a single pixel 
camera provided by the NASA Langley Research Center. Since a practical PMMW sensor 
array was not completely developed by NASA’s commercial partner, we could not continue 
using this sensor modality. The remaining sections of this report use video images. 

Section 4 describes our motion-based segmentation algorithm for segmenting images 
into regions corresponding to different motions and estimating their motion parameters. 
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An algorithm for computing the optical flow from image sequences is described. A planar 
motion model and a linear algorithm for recovering the model parameters from optical flow 
vectors are presented. A recursive algorithm for estimating single and multiple indepen- 
dent motions is described. The results obtained using both synthetic and real images axe 
presented. 

Algorithm for estimating the plane parameters using line features is described in Section 
5. Details are given for a Kalman filter-based approach for tracking line features in the 
image sequence and estimating the position and velocity of 3-D objects moving on a planar 
runway. The feasibility of these methods was tested using real image sequences; the results 
are presented in this section. Section 6 contains a summary of the entire research and 
suggests future research topics. 
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2 Sensor Sensitivity Evaluation and Calibration 


A Synthetic Vision System for enhancing the pilot’s ability to navigate and control the air- 
craft during landing and taxiing operations uses an on-board airport database and images 
acquired by external sensors such as millimeter wave, infrared and low light TV cameras. 
Additional navigation information needed by the system is provided by the Inertial Naviga- 
tion System (INS) and Global Positioning System (GPS). The data from these navigation 
instruments are known only to a certain accuracy (depending upon the type of the instru- 
ments used), and are updated once every second. Since data from on-board instruments 
are more accurate than the GPS-based data, they are useful for obtaining more accurate 
camera position. An alternative approach is to use the 3-D location information of certain 
landmarks such as intersection of runways/taxiways, corners of buildings and their corre- 
sponding positions in the image to obtain a better estimate of the camera position. This 
requires an analytical study of the relationship among the camera positional parameters 
(i.e. position and orientation in 3-D), the sensor characteristics, and the relative distance 
between the sensor and objects. 

2.1 Sensor Positional Sensitivity Evaluation 

Imaging is a process of mapping a large 3-D scene onto a small 2-D plane. This 2-D 
image plane is quantized in both spatial directions to facilitate image processing by digital 
computers. The limited resolution of the sensor requires that the entire scene be mapped 
to an M x N array of points called pixels. Such an array being too small to adequately 
represent the scene introduces significant amount of error into computations involving the 
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locations of image points and features. Another natural outcome of the imaging process is 
the loss of depth information resulting from mapping of an infinite number of points on a 
line-of-sight onto a single point in the image plane. As a result, reconstruction of a scene by 
computing the depth of scene points using image-based features is a challenging problem. 
The inverse process, known as the camera calibration problem, is to compute the camera 
position by triangulation between known scene points in the world and their corresponding 
positions in the image plane. Nearly all of the methods for solving this problem require 
only one monocular image with special marks which can be man-made point features or line 
features [42, 46, 54, 64, 69, 113]. The accuracy of such computational approaches, however, 
depends upon the camera geometry and the quantization of the image plane. 

Most analysis for obtaining three-dimensional position information of the scene points 
are based on the triangulation system [13, 76]. To compute the depth to a scene point, 
triangulation must be performed using its projection onto two images, captured either 
simultaneously, by two cameras separated by a base distance (as in the case of binocular 
stereo), or separately, by a single camera at two different positions (as in the case of motion 
stereo). This requires computation of features in one image and a correspondence in the 
other image. Assuming correct matches have already been found, the next step is to recover 
the 3-D information using triangulation. The accuracy of such a computation depends on 
the spatial quantization of the image plane. 

Equations for calculating error in distance measurement between an object and a stereo 
vision system established in [76] present a worst case error analysis, and show that the 
percentage error in distance measurement is inversely proportional to the number of pixels 
occupying the shift between the two images and directly proportional to the object dis- 
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tance. Blostien and Huang [13] have developed equations to determine the probability that 
a certain estimate is within a specified tolerance given the camera geometry of a stereo 
setup. Error equations developed in [31, 32] for determining the optimum line width for 
visual navigation of an autonomous mobile robot give the percentage of error for any sensor 
geometry, line width, and error conditions. Even though their analysis looks somewhat 
similar to our work, the basis for their analysis was measurement of line width. 

In this research, we estimated the aircraft position by tracking known static scene points 
in the im age plane. Since the accuracy of this estimation depends on various sensor pa- 
rameters, we wanted to develop an analytical model that could relate the accuracy of the 
estimated sensor position to the sensor positional parameters and attributes of the captured 
image. The sensor positional parameters include range, cross range, altitude and pitch, roll, 
and yaw angles. Sensor imaging attributes include the number of pixels in the image and 
the optical angular view (measured in degrees). 

We assume a pin-hole camera and, therefore, ignore camera lens distortion and optical 
non-linearities. We also assume that the problem of feature correspondence was solved 
using some correspondence algorithm and that the correspondence was accurate to a single 
pixel. Hence, the analysis does not depend on the type of algorithm used for establishing 
feature correspondence. Any error in computation was basically due to spatial quantization 
of the image plane. In this analysis, we projected the image plane onto the ground, and 
the pixel (p,q) in the image plane was modeled as a patch on the ground plane (ground 
area represented by the pixel). We defined the sensitivity or the error in computation as 
the minimum change in a camera parameter that would move a fixed ground point to the 
next pixel in the image plane. 
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Figure 1: Airplane body fixes 


2.1.1 Imaging Geometry 

Throughout the analysis, for convenience, we assume that the sensor is located at the 
aircraft’s center of gravity. Hence, we can use the terms sensor position and aircraft position 
interchangeably. We also neglect the effect of curvature of the earth. 

The system of reference axis that forms the basis of the system of notations used to de- 
scribe the position of the sensor is shown in Fig. 1. The figure shows an aircraft with three 
mutually perpendicular axes (pitch, roll, and yaw) passing through the aircraft’s center of 
gravity. The position of the aircraft is defined with respect to the three mutually perpen- 
dicular world coordinate axes X, Y, and Z. The image plane is assumed to be perpendicular 
to the rolling axis, with its vertical and horizontal axes coinciding with the yawing and the 
pitching axes of the airplane, respectively. 

Fig. 2 shows an imaging situation during landing, where the aircraft is at (X c , Y c , Z c ), 
with pitch angle 6, zero yaw, and zero roll angle. Let a = 90 — 6. The field of view of the 
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camera is determined by two viewing angles: A a defined in the same plane as 6, and A/3 at 
right angles to Aa. (A a determines the vertical extent of the image and A/9 its horizontal 
extent.) Even though the image obtained by the sensor is always a rectangle, the ground 
area captured by the sensor is a trapezoid ABCD whose side length and area depend on 
Aa, A/3, and various other sensor parameters like position, orientation, etc. Note that a 
pixel in the image plane corresponds to a patch on the ground plane. We refer to this as a 
pixel-patch (See Fig. 3). 

2.1.2 Sensitivity Analysis 

Consider a point feature which has been detected at some pixel (p,q ). Let the actual world 
coordinates of this feature be (P, Q,0). Since a pixel represents a patch on the ground, the 
camera could change in its position by a certain amount while still retaining the image of the 
feature at the same pixel {p,q). Hence, a camera pose estimation by passive triangulation 
will always give the same camera pose for nearby camera positions, unless the change in 
the cam era position is large enough for the feature to be observed in the neighboring pixel. 
We define this minimum change in camera displacement as the sensitivity of the camera. 
Note that this is a measure of accuracy of camera position estimate and is a function of the 
camera position, image size in number of pixels, angular resolution, and the pixel location 
( p , q) in the image plane. 

Let N x and N y represent the number of pixels in the vertical and horizontal directions, 
respectively. The pixels are numbered -^,...0,... ^ - 1 in the vertical direction and 
-^, ... 0, ... ^ - 1 in the horizontal direction. The rolling axis of the plane is assumed to 
pass through the bottom right corner of the patch on the ground plane, which corresponds 
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Figure 2: Image obtained by the sensor is projected towards the ground 
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Figure 3: Ground area covered by the sensor. Each small trapezoid corresponds to a pixel 
in the actual image 


to the center pixel in the image plane. Other pixels are referenced in a similar manner. 
The coordinates of the reference corner of the ground area covered by a pixel (p, q) can be 
estimated by the following relations: 


A ot x 

X = X c + Z c tan(a + p— ) 

IV x 


Y = Y c + 




C T — , „Aa > tan (9Tr) 


COS 


( a +w) N y 


(i) 


For a non-zero rolling angle <j>, the ground coordinates (X', Y') which correspond to a pbcel 
(p,q) in the image plane axe obtained with replacing (p, </) in the equation by (p',q '), where 


p' = p cos (f> — q sin <f> 

q' = psin<^ + qcos^» (2) 


Since a pixel-patch is referenced by the bottom right corner of the pixel, the other three 
corners become the reference of its three neighboring pixel patches, as shown in Fig. 4. 
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(p. q+D 

Figure 4: A pixel (p, q) projected towards the ground 


Thus, the four corners of this pixel-patch i = 1,2, 3,4 are obtained by using Eq. 

1, where (p, q) are replaced by (p', q'), such that 


p- = pi cos <f>- qt sin </> 

q'i = pi sin <f> + qi cos <f> (3) 


where (pi,qi) = (p, g), (P 2 )? 2 ) = (p+ (P3> Qi) = (p + 1>9+ 1)> iPiiQi) — (Pi9 + !)• 

Eq. 1 gives the relationship between the camera parameters (X C ,Y C , Z c ,9,<p) and the 
ground point corresponding to a pixel (p,g). We are now interested in computing the 
sensitivity of the imagery sensor. This is defined as the minimum change in a camera 
parameter that would move a fixed ground point to the next pixel in the image plane. We 
obtain this by taking the partial derivative of X[ and Y[ with respect to the corresponding 
parameter. For example, 




( 4 ) 
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This derivation is an approximation to the amount of change in X[ for unit change in 
X c . Thus, we estimate that the amount of change in X c in order to change X[ to X 2 or Y{ 
to Y 2 (which define the corners of adjacent pixels) as 




(n - yd 


(5) 


Note that S^ c = oo, as expected. Sensitivity with reference to other parameters is defined 
in a similar manner. These are summarized in Table 1. In general, 5* stands for sensitivity 
in the direction i due to the sensor positional parameter j computed at pixel (p, q) in the 
image plane. 

Sensor sensitivity is a function of various sensor parameters and sensor attitudes. Since 
the sensor plane is inclined to the ground plane, the sensitivity varies in the vertical and 
horizontal directions, along the sensor plane and, hence, is a function of pixel number (p , q). 
Equivalently, the accuracy of estimation of sensor position using ground truth data is a 
function of pixel position as well as other parameters. For a given range, the estimation 
using features that are observed at the top half of the sensor are less accurate because of the 
large ground area represented by these pixels. Also, for a given p, the accuracy decreases 
as we move towards the border of the sensor in the horizontal direction. In summary, the 
accuracy of estimation is a function of sensor characteristic and the ratio of the sensor view 
angle to the number of pixels in the image. 
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A - cos(a+p[j%y B ~ tan(?/l N v ) 

H tan(a + p'i ){~P sin (p — q cos 4>) cos(a + p\ ) 

§§ = ^ (p cos (p — q sin <p) cos 2 (q[ ) 

p[ — pi cos <t>-q\ sin (f>; p\ = Pi cos (f> — q\ sin <f> 

q[ = pi sin <j) + qi cos <f>; q' 4 = Pa sin (f> + q\ cos <p 

a = 90 + 0; (pi,?i) = (p,q); (P4,<?4) = (p>g+ 1) 


Table 1: Sensor positional sensitivity equations 
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2.1.3 Quantitative Results and Discussion 


The sensitivity analysis described in the previous section was applied to three different 
sensors at six different positions. Characteristics of the sensors are given in Table 2. Sensi- 
tivities Sx c , Sy c , and at the aim point (i.e., p = 0,q = 0) for various sensor positions 
are plotted in Figures 5, 6, and 7, respectively. In all of the above cases, pitch angle 
is -3.0°, roll angle is 0°, and cross range is 0 feet (typical values when an aircraft is ap- 
proaching the runway for landing). Note that Sf c is larger than 5^. at (0, 0) and, hence, a 
feature would move to the next horizontal pixel before it moves to the next vertical pixel. 
Thus, only is important. 

As expected, the sensitivity is best for the sensor with the highest pixel resolution. 
Sensitivity also improves as the sensor is moved closer to the ground. It becomes poor 
for the features that are located at the far end of the vertical axis (top of the sensor i.e., 
for the objects that are located at the far end of the runway). As expected, the position 
and velocity of the aircraft can be computed more accurately by knowing the position of 
stationary objects on the ground that are closer to the aircraft. 

The above results indicate that the accuracy of camera state estimation would be no 
better than the GPS unless a high resolution sensor is employed. Note that these results 
do not consider potential improvements that can be obtained by motion stereo techniques 
using a large number of image frames, or by using more feature points than the required 
minimum. In addition to being useful in sensor design, this analysis will also help us evaluate 
the accuracy of camera state estimation by any algorithm that uses image-based features 
that correspond to known scene points. 
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Sensor Positional Parameter 

Sensor Characteristic 

Location 

Range 

Altitude 

Sensor Type 

Pixel 

Field of View 


in ft. 

in ft. 


(H x V ) 

(. H x V)deg 

Threshold 


50.0 

HDTV 

1920x1035 

30x24 

CAT II-DH 

908.1 

100.0 

FLIR 

512x512 

28x21 

CAT I-DH 

2816.2 

200.0 

MMW 

80x64 

27x22 

Middle Marker 

4500.0 

288.2 




1000’ Altitude 

18081.1 

1000.0 




Outer Marker 

29040.1 

1574.3 





Table 2: Sensor positional parameters and sensor characteristics 



Figure 5: Sensitivity in the direction of range 
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Figure 6: Sensitivity in the direction of cross range 



Figure 7: Sensitivity in the direction of altitude 
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2.2 Camera Calibration from Image-based Features 


Camera calibration is the problem of determining camera parameters - including location 
and orientation - from images. Camera parameters can be grouped into two categories: 
extrinsic parameters , which provide information regarding the camera position and orien- 
tation with respect to a reference world coordinate system, and intrinsic parameters , which 
include focal length and scale factors in units of pixels in the image plane. 

A number of methods for estimating the camera pose are available in the literature. 
They can be divided into two classes: 

• The most common approach to estimating the camera pose has been to use point and 
line correspondences. These point or line features arise either from certain man-made 
objects placed in the scene for the purpose of calibration or from certain landmarks 
that already exist in the 3D scene. Estimation is done either by applying triangulation 
to known scene points and their corresponding image points, or by developing and 
solving a set of equations that relate the scene points, the corresponding image points, 
and the camera parameters. Earlier research on this approach was done by Wolf 
[130], Fischler and Bolles [39], and Ganapathy [43]. Liu et al. [70] showed that 
the computation of the rotation and translation vector of the camera are separable. 
Chou and Tsai [23] used house corners as marks and Haralick [46] used the corners 
of rectangles of unknown size to determine the camera view angle parameters. Chen 
et al.[21] proposed using a cube as a calibration object, whereas a method that uses 
rectangular parallelepiped with known dimensions as the calibration object is proposed 
in [113]. For calibration of a camera mounted on an autonomous land vehicle running 
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on an outdoor road, Liu and Deng [66] used road boundaries as calibration objects. 
Fukuchi [42] used special man-made marks with certain constraints for determining 
the position of a robot using standard shaped marks. 

• Objects with curves have been used for camera calibration, such as a plane with conic 
or polygon arcs as in [48] and semi-circle in [65]. Magee and Aggarwal proposed a 
method that uses a calibrated sphere [75]. 

All of the above procedures involve two steps. The first step is to locate feature points on 
the image plane that correspond to the known 3-D points, and the second step is to formulate 
and solve a set of equations that relate the scene points and the image points, thereby 
satisfying certain constraints. These computational approaches assume an ideal pinhole 
camera and model the pixels as points of insignificant dimension, largely ignoring the error 
introduced by the image plane quantization. More accurate calibration of the camera, or a 
priori knowledge about the error introduced by the image plane quantization, is quite critical 
in various aspects of computer vision like object recognition, scene reconstruction, robot 
navigation, etc. A brief review of some of the past work in error analysis of triangulation 
to image plane quantization is given in the next section. 

2.2.1 Error due to Image Plane Quantization 

Quantization of the image plane as well as the intensity levels have significant impact on the 
outcome of various computational approaches to solving computer vision problems. The 
problems are particularly important when the images are captured using low resolution 
sensors. The impact of quantization error on computer vision was addressed as early as 
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1969. A report by Hart on stereo-scopic calculations discusses sensitivity of triangulation 
process to pan, tilt, and quantization errors [49]. McVey and Lee developed equations for 
measuring the worst case error in calculating the distance between an object and a stereo 
vision system [76]. 

The navigation system reported in [31, 32] uses a single continuous strip painted on the 
floor marking the robot’s route. For successful navigation of the robot, an important design 
consideration was the width of the line. That study analyzed the effects of various error 
conditions on the width of the line, as seen in the image plane, to determine the optimal 
line width on the floor. 

The effect of image plane quantization on the determination of object location using 
stereo set-up was analyzed in [13, 85]. These works assumed that the scene point was 
equally likely to be everywhere within the volume formed by the lateral [13] or axial [85] 
stereo triangulation method, and derived the probability distribution of the errors in all 
three component directions. 

Later in this section, we describe a new approach for determining the error in the camera 
state. Our approach is based on the fact that two or more lines-of-sight connecting the scene 
points and the corresponding image pixels meet at a single point known as the focal point. 
Due to finite size of the pixel, for each scene point and the corresponding pixel, we can 
consider four lines-of-sight passing through the four corners of each pixel. Two or more 
scene points and their corresponding image pixels result in a polyhedron within which the 
focal point is expected to lie. The volume of this polyhedron is proportional to the error 
in the camera pose. We propose to minimize this volume by considering more points. The 
selection of proper scene points — and also the number of scene points — reduces the region of 
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uncertainty in the determination of the camera pose. This analysis can be used to determine 
good calibrating points, their distribution, and the number of scene points required to stay 
within the allowed range of computational error for a particular vision problem. In situations 
where it is not possible to determine the calibrating points in advance, this analysis can be 
used to compute the error in the estimated camera position. Our analysis can be used to 
design algorithm to select good feature points for calibration from available pool of feature 
points dynamically. 

2.2.2 Analysis of Image Plane Quantization Error 

Determination of camera external parameters using point-based features requires a certain 
minimum number of scene points (Si, S 2 , S 3 , . . . , S n ) and their corresponding image points 
(h,h, h, ■ ■ • , 40- Basic triangulation of these scene points and the corresponding image 
points requires all the lines-of-sight to pass through the focal point of the camera lens. 
Under ideal conditions (ignoring the effects of quantization of the image plane and lens 
distortion), for a given set of scene points and their corresponding image points, there can 
be only one unique position for the camera in the world coordinate system. Due to the 
finite size of an image pixel, however, a given scene point can be projected onto the image 
plane through an infinite number of points within the finite sized pixel. When any two such 
scene points axe projected onto the image plane, the intersection of these lines of sight form 
a polyhedron of finite volume within which the focal point of the camera is expected to lie 
anywhere, as shown in Fig. 8. The approach presented here computes the range of values 
for camera state parameters (i.e., the 3-D coordinates of the center of the image plane in the 
world coordinate system and its orientation with respect to the reference axes that would 
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Figure 8: Lines of sight forming a polyhedron due to image plane quantization 

satisfy the constraint for the valid location of the focal point). 

As explained in the previous section, due to image plane quantization, basic triangulation 
of known scene points and their corresponding image points forms a polyhedron of finite 
volume within which the focal point of the lens is expected to lie. Since the camera’s 
extrinsic parameters (position and orientation) are uncertain, it is not clear where this 
polyhedron is located in the world coordinate system. However, it is known that the focal 
point is at a perpendicular distance / (focal length) in front of the image plane along the 
optical axis of the camera, and that the scene point S t corresponds to image point 7j in the 
image plane. Hence, there are ranges of values for each camera parameter that would satisfy 
the above two constraints. In the next two sections we present the mathematical treatment 
of the above concept, to find the range of values for each camera parameter separately. 

The problem addressed here is different from the well known problem of determination 
of error in computing the 3-D position of the objects using a stereo system [13, 85], where 
the cam era position is fixed and known and, as a result, the region of uncertainty for the 
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object location is well defined by the camera geometry. However, in our problem scenario 
the location of the region of uncertainty is not defined a priori, since the camera position 
itself is uncertain. Although the problem discussed in [31, 32] is somewhat close to the 
problem being addressed here, their analysis is limited to a particular application. The 
approach taken in this work does not make any assumptions about the kind of outdoor 
scene points available for calibration. 

Fig. 9 shows the system geometry used in this analysis. Two separate coordinate 
systems are shown: the coordinate (X, Y, Z) represents the world coordinate system, and 
the coordinate (it, v) represents the coordinate system in the image plane. (X c , Y c , Z c ) is the 
camera position in the world coordinate system and ( 6 , ip, <p) are, respectively, the tilt, yaw, 
and roll angles of the camera. We follow the convention given in [45]: the term “camera 
position” or “sensor position” means the position of the center of the image plane, and the 
focal point is in front of the image plane. In the above geometry, we assume the gimbal 
center offset to be zero and Z = 0 to be the ground plane. Therefore, the height of the 
sensor is Z c . Let A u and A v be the size of a pixel along the u and v directions, respectively, 
on the image plane. Let / be the focal length of the camera and M x TV be the number of 
pixels along the u and v directions of the image plane, respectively, with the center pixel 
numbered as (0,0). We make the following assumptions in this analysis: 

• This work analyzes the accuracy in obtaining the camera pose by triangulation of 3D 
scene points on a single quantized image plane. Image plane quantization is assumed 
to be the only source of error in computation of the camera position. 

• A pinhole camera model is assumed [59], thereby ignoring camera lens distortion and 
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Figure 9: Camera geometry 
other optical non-linearities. 

• The problem of detecting the image point that corresponds to a given scene point is 
assumed to have been solved. The world coordinates of the scene points used in the 
calibration process are assumed to be accurate. 

• The focal point is in front of the image plane, as is the case in any practical camera. 
Assuming the image plane to be in front of the lens center is not appropriate for this 
analysis, since this might introduce significant error in the volume of the polyhedron 
formed - especially when the scene points are close to the camera. 

Analysis is carried out in the following four steps: 

• Ass um e the camera position vector to be (X C ,Y C , and develop equations 

for the world coordinates of the corners of the image pixels that are to be used in the 
calibration process. 


30 


• Form the plane equations that contain the scene point and the two adjacent corners 


of the pixel corresponding to the scene point. 

• Compute the coordinates of the focal point in the world coordinate system for the 
assumed camera position vector. 

• Verify that the focal point lies within the polyhedron formed by the intersection of 
the above planes. 

The mathematical treatment of the above four steps are described in detail below. 

A. Computing the world coordinates of the comers of image pixels 

We assume the image plane to be at the origin of the world coordinate system, with zero 
tilt, roll, and yaw, as shown in Fig. 10. In this situation, the image plane coincides with the 
XZ plane and the world coordinates of a pixel (i,j) in the image plane are (iAu, 0, jAv). 
When the camera is rotated, the coordinates of the pixel (i, j) in the world coordinate sys- 
tem can be computed by applying necessary point transformations [45] to the original point 
as given below. 



31 



z 



Figure 10: A point ( i,j ) in the image plane rotated about (X,Y,Z) 


where 
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The individual elements of the rotation vector can be computed by premultiplying the 
following three rotation vectors. 
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( 8 ) 


where C stands for cosine and S for sine of the respective angles. 

Since we are concerned with computing the world coordinates of the corners of any pixel 
in the image plane, we can think of the imaging process as rotating every point in the image 
plane by applying proper rotation vectors, with the center of the image plane as the origin of 
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the world coordinate system [45], and then translating them to a new position in the world 
coordinate system by applying proper translation vector. Therefore, for nonzero values of 
the camera position vector, the world coordinate of a pixel (i,j) in the image plane is given 
by 


/ \ 
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Xij 
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x c 

Yij 
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Zij 
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( 10 ) 


The 3D coordinates (Xfj, Yjj, Zjj) of the four corners of the image pixel (i,j) in the world 
coordinate system can be computed by the following equations. 



( 11 ) 


where k = 0, 1,2, 3 stands for the four corners of the pixel Values of k\ and k% are 

decided by the value of A; as shown in Fig. 11 . 

B. Form the plane equations 

The second step in the analysis is to form the plane equations that contain the scene point 
and the two adjacent corners of the pixel in the image plane that correspond to the scene 
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Figure 11: Pixel (i,j) in the image plane with numbers assigned to the corners 


point (see Fig. 8 ). Let ( X p ,Y p ,Z p ) be the coordinate of the scene point p whose image is 
formed at pixel ( i p ,j p ) in the image plane. The coordinates of the four corners of the pixel 
(i p , j p ) can be computed using Eq. 11 as 



(12) 


Two adjacent corners of the pixel (i p) j p ) and the scene point ( X p , Y p , Z p ) form a plane. 
Every scene point and the corresponding image pixel gives rise to four such planes. Each of 
these planes can be represented by plane equations that contain the three points ( x p ,y p ,z p >. 
(X Jem 0 d 4 P y fcmod 4 p z fcmod 4 P) (X (fc+i)mod4 P y (*+i)mod4p z (fc+i)mod< P) The equation for the plane can 

be written as 

a$X + ti$Y + <?*Z + = 0 (13) 
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p = 0, 1, 2, . . . , n axe the scene points and (a. pk , t% k ,c% k , <P t k ) are the plane parameters. 


C. Find the world coordinates of the focal point 

The focal point of the camera in our model is at a perpendicular distance / in front of the 
image plane along the optical axis of the camera. The world coordinates ( Xf,Yf , Zf) of the 
focal point for a given image plane position (X C ,Y C , Z c ) can be computed (as in Step 1 of 
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this analysis) by use of necessary transformation: 



( 15 ) 


D. Find the range of values for camera parameters 

Four planes formed by a scene point and its corresponding pixel edges in the image plane 
intersect with other similar planes formed by other scene points and their corresponding 
image pixel edges, thus forming a common volume of a polyhedral shape. For a given 
camera position, the estimated focal point should lie within this volume (see Fig. 8 ). One 
way to check for this is to compute bounding surfaces of the polyhedron and then verify 
that the focal point lies within the polyhedron. This method is computationally expensive, 
since it is not easy to determine the coordinates of the vertices of the polyhedron formed 
by the intersection of many planes, especially when some of the planes may lie outside the 
polyhedron formed by the other planes. Instead, we extend the concept of solution set of 
linear inequalities [81] to determine if the point is within the polyhedron. 

Solution set of Linear Inequalities 

The solution set or graph of the inequality ax + by + cz + d>Q consists of all the points 
that lie on or above the plane ax + by + cz + d = 0 if c > 0; and consists of all the points 
that lie on or below the plane ax + by + cz + d = 0 if c < 0. Similarly, the solution set of 
the inequality ax + by + cz + d< 0 consists of all the points that lie on or above the plane 
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ax + by + cz + d, = 0 if c < 0; and consists of all the points that lie on or below the plane 


ax + by + cz + d — 0 if c > 0. 


z 



Figure 12: Two of the four planes formed by a scene point and its corresponding image 
pixel 

Fig. 12 shows how the above concept can be used to verify whether the focal point 
is within the polyhedron. (For clarity, only one scene point and its corresponding image 
pixel is shown with only the two of the four planes (plane 0 and 2) formed by the scene 
point and the corresponding pixel edges.) According to the figure, the focal point has to lie 
below plane 2 and above plane 0. Hence, for ( Xf , Yf, Zj) to lie within the common volume 
formed by the intersection of the planes, every plane has to satisfy the following inequality: 
for planes 0 and 2 


a%jX f + Y f + <%jZ f + <fi-< 0 if Cij > 0 

agX f + t$Yf + <$Zf + itg >0 if Cij < 0 
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for planes 1 and 3 


a\-X f + tfjYf + <$Z f + d£ > 0 
c$Xf + %-Yf + <$Z f + d$f < 0 

The values of camera parameters are varied independently and separately around their true 
values to find the range of possible values for the camera parameters that will satisfy the 
above set of inequality equations formed by a set of n scene points and their corresponding 
image points. 

2.2.3 Experimental Results 

The above analytical procedure is used to determine the calibration error for a simulated 
camera geometry. The analytical procedure for simulating the test data is described in the 
following section. 

2.2.4 Data Simulation 

For verification of this analysis using simulated data, a camera with an M x N pixel sensor 
with sensor elements of size A u x Au is considered. Let / be the focal length of the 
sensor. For a given camera position vector (X c ,Y c ,Z c ,d,ip,<j)), a number of scene points 
with coordinates (X p , Y p , Z p ) that are computed to lie within the angular view of the camera 
are picked randomly. The coordinates of these scene points with reference to the camera 


if Cij > 0 
if Cij < 0 


(17) 
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axes axe computed by applying the following transformation 
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where R~ l = R^ 1 Rq 1 R^ 1 . (Camera geometry is assumed to be the same as in the previous 
section where w is the optical or viewing axis of the camera.) (See Fig. 9.) is 

the coordinate of a scene point with respect to the camera axes (u, and (X P ,Y P , Z p ) 
is the scene point whose image coordinate is to be computed. 

The position of the image point on the image plane after perspective transformation is 
given by 
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The coordinate of the image point in terms of pixel numbers is computed by applying proper 
image plane quantization as 


u 
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(20) 


The set of scene points (X p , Y p , Z p ) and their corresponding image coordinates ( i p ,j p ) are 
used to compute the accuracy of the camera state estimates using the analytical procedure 
described in the Section 2.2.2. 
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2.2.5 Results and Discussion 


Three sets of tests were conducted to evaluate the effect of quantization on camera state 
estimation using point-based features. The objective of the first set of tests was to compute 
the error in various camera parameters as one of the camera parameters was continuously 
varied over a range of values. Figures 13(a) through 13(f) show the estimation error for 
each of the external parameters [i.e., (X c , Y c , Z c , 4)] using two fixed ground points, as 
the camera was moved along the Z direction with zero roll and yaw, and a tilt of 60°. A 
high resolution camera with a pixel width of 0.005 x 0.005 cm and focal length of 7.5 cm 
was considered. The coordinates of the two ground points used in the triangulation process 
were (10, 48, 0) and (-10, 68, 0). The error in computing the camera state increased as the 
camera moved away from the calibrating scene points. Similar results were obtained for 
different sets of calibration points and also when the camera moved in X and Y directions. 
The serrated nature of the error curve was also observed in the stereopsis experiment carried 
out by Hart [33]; similar arguments can be used to explain the process here. 

To study the variation of error with image resolution, the above set of tests were repeated 
for three sensors with different resolutions. The experiments were repeated for sensors with 
pixel size 0.0005 x 0.0005, 0.01 x 0.01 and 0.02 x 0.02. The error in computing the three 
positional parameters (i.e., (X C ,Y C) Z C )) for each of the sensors is plotted respectively in 
Figures 14(a) through 14(c) . In each of these cases, the camera was moved along the 
Z axis. Two ground points were used in the triangulation process, and the camera was 
assumed to have a tilt of 60°, with zero roll and yaw. As expected, the error in the camera 
pose estimation increased as the pixel size increased. 
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The objective of the third set of tests was to observe the effect of selecting more scene 
points for calibrating the camera. Figures 15(a) through 15(c) show the error in computing 
the positional parameters ( X c , Y c , Z c ) of the camera by considering two and three calibrating 
points. A camera with pixel size of 0.01 x 0.01 was assumed with a tilt of 45°, zero roll and 
yaw. The coordinates of the three points used in the triangulation were (0, 46, 0), (4, 46, 
0), and (-4, 46, 0). As expected, using three points improved the accuracy of estimation. 
Considering even more points has been found to further improve accuracy. The accuracy 
of computation was also found to depend upon the location of the feature points on the 
ground. 

Image plane quantization - the natural outcome of an imaging process - has a signifi- 
cant impact on the accuracy of the outcome of all computational approaches to computer 
vision problems. Small errors introduced in estimating the camera pose due to image plane 
quantization might have significant impact on later stages of processing. The analytical 
approach proposed in this section lets one compute the error in each of the camera pa- 
rameters separately. Given the scene points and their corresponding image points, one can 
decide in advance which scene points are good for estimating the camera pose and how 
many calibration points might be necessary to have the computed pose within the desired 
accuracy r ang e. One drawback of the proposed analytical approach is that, with the present 
approach, it is not dynamically possible to identify good scene points or select an additional 
point to improve accuracy. 
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Figure 14: Error in camera parameters for sensors with different resolutions: Error in (a) 
X c (b) Y c (c) Z c . 
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2.3 Summary 


Since the camera pose information provided by the GPS and on-board instrument is not 
accurate, we intend to use image-based features such as points or lines to estimate the 
position and orientation of the camera in 3-D. The accuracy of such estimation depends on 
the sensor resolution and sensor position in 3-D. A SVS is envisioned to be equipped with 
sensors of different modalities operating under different weather conditions and lighting 
situations. Since the resolution of these sensors varies, we wanted to explore whether the 
use of image-based features (such as points or lines) when used in a passive triangulation 
could provide a better estimate of the camera pose than what was provided by the GPS 
and other on-board instruments. 

In section 2.1, we described an analytical model for computing the error in the camera 
pose estimation using image-based features. The sensitivity equations developed using the 
above model relate the error in the individual sensor positional parameters to the sensor 
resolution and the position of the image features in terms of the pixel number in the image 
plane. We evaluated three different sensors at six different positions using the equations 
developed based on the proposed analytical model. It is clear from our analysis that only 
HDTV can provide a better estimate than the GPS. The use of HDTV, however, is not 
currently approved or recommended. One alternative is to use the GPS information as the 
initial solution in a non-linear optimization algorithm to obtain an optimal solution from 
the image-based features obtained from the low resolution sensor images. The problem of 
integration of various sensor information was not explored in this research. 

In section 2.2, we proposed a new method for computing the camera pose and the error 
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in the estimated camera pose due to image plane quantization. This procedure was based 
on the fact that lines-of-sight formed by two or more feature points provide an uncertainty 
volume for the camera pose. We proposed a non-linear algorithmic approach for computing 
this uncertainty volume. By selecting more feature points, the uncertainty volume was 
reduced. Selecting good points from the available set of feature points, however, will improve 
the convergence of the algorithm. The earlier model provided the distribution of error in the 
camera pose estimation, whereas the second model provided an algorithm for estimating the 
camera pose and the error in the camera pose, and also minimized the error by considering 
more points. 

If no prior knowledge about the likelihood of error in the estimate is available when 
selecting a point for use in camera pose estimation, then the second algorithm can be used 
alone in a greedy algorithm. However, this does not guarantee a global minimum in the 
error function. By using the first analysis, we can presort the feature points based on the 
sensitivity value at those pixels for a given position of the sensor, and use the features 
incrementally to obtain a global minimum in the error function. 
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Figure 16: The input Passive Millimeter Wave image 

3 Detection of Objects in Passive Millimeter Wave Images 

A Synthetic Vision System (SVS) integrates the outputs of various sensors with GPS/INS 
information and airport geometry database to locate the aircraft within the airport, detect 
obstacles, determine potential conflicts, issue advisories, and sound cockpit alarms. The 
system selects among the outputs of different sensors depending on the visibility conditions, 
and processes these sensor outputs to extract informations to guide the pilot. In the previous 
section we analyzed the sensitivity of three sensors which could be used under three different 
visibility conditions( i.e, FLIR during night, HDTV during day light under normal weather 
conditions, and PMMW during fog). Since the resolution and quality of the images captured 
using these sensors vary, an algorithm for a specific purpose should be designed depending 
on the sensor type. In this section, we describe an algorithm for detection of both the 
runway and obstacles on the runway for images acquired using a PMMW sensor [117]. 

Since the energy attenuation in the visible spectrum due to fog is very large [132], sen- 
sors are being designed to operate at lower frequencies (e.g., 94 GHz). A lower attenuation 
provides the ability to see through fog. Images obtained from sensors operating at this 
frequency (such as the PMMW sensors) are of very low spatial resolution (Fig. 16); how- 
ever, supporting information about the airport and the position, orientation, and velocity 
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of aircraft is generally available. We used this information to guide our image analysis sys- 
tem. The geometric model of the airport contained positions of the runways/taxiways and 
buildings. The navigation instruments provided the position of the aircraft, and on-board 
instruments provided the orientation of the aircraft (yaw, pitch, and roll). We used this 
information to define regions of interest in the image where important features - such as 
runways/taxiways, the horizon, etc. - are likely to be present. Edges corresponding to 
these features of interest axe detected within these regions. After delineating the regions 
representing runway/taxiways, we look for objects inside and outside these regions. The 
proposed system consists of the three functional modules, which are described below. 

• Model Transformation allows input of the airport model and the camera state infor- 
mation to define regions of interest in the image plane. 

• Feature Detection and Localization algorithm operates within these regions of interest, 
to detect the edges of the runway, horizon, etc., in the image. 

• Object Detection algorithm uses simple histogram-based thresholding to detect object 
regions which are distinct from the homogeneous background. Two different thresh- 
olds are used to detect obstacles inside and outside the runway. 

3.1 Model Transformation 

Since PMMW images are low-contrast, low-resolution images, simple edge detection tech- 
niques on these images generate many noisy edge pixels, in addition to those belonging to 
the true edges, such as runways, sky, etc. This problem is alleviated by defining regions 
where the true edges are expected to occur, using knowledge about the aircraft position and 
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a model of the airport. The main function of the module is to define a region of interest on 
the ground plane for each feature in the model and to perform 3-D to 2-D transformation. 
The module also defines a region in the image plane where the horizon line should occur. 

3.1.1 Defining Regions of Interest for Runway Edges 

The error between the expected location of a feature and its actual position in the image 
depends on several factors, most notably the accuracy of the camera position parameters 
used by the model transformation. Furthermore, it is evident from the analysis in section 2 
that the ground area covered by a pixel is a function of the position of the pixel in the image. 
Thus, it is not reasonable to define the search space for each feature as a fixed number of 
pixels centered around the expected location in the image plane. We define the region of 
interest in the 3-D space and then apply transformation to get the corresponding region of 
interest in the image. The extent of the search space in the 3-D space is determined by the 
estimated error in the camera positional parameters (which are based on GPS and on-board 
instrument data). 

The geometric model of the airport contains a sequence of 3-D coordinates for the 
vertices of the runway /taxiway, which form a polygon with n vertices: 

runway = {Pi} , i = 1,2, ... ,n. (21) 

where P t = ( Xi,Yi,Zj ) is one of the vertices of the polygon. Note that Z t = 0. P t Pi+i 
specifies an edge of the polygon. The region of interest is defined as a rectangle on the 
ground which encloses the edge. Therefore, each edge PjPj+i of the polygon is associated 
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with the region of interest by four points bi = (Xj,Yj, Zj), j — 1, . . . , 4, and Zj — 0. 

The width of the region of interest is defined as a function of the width of the run- 
way/taxiway, w accuracy of the GPS data, g(g < 1) and the accuracy of the on-board 
instruments, d{d ^ 1). Note that g and d are determined by the specification and charac- 
teristics of these instruments. This relationship is given by 

width(w,g,d) = —-j~ (22) 

gd 

Note that the minimum width is 0.2iy when g = d = 1, which corresponds to ±10% potential 
displacement of runway edge feature. To limit the search area from being a large fraction 
of the runway width, we limit the search width to 0.4u> - even if gd < 0.5. 

After defining the region of interest for each edge, 3-D to 2-D coordinate transformation 
is performed using perspective projection, rotation, and translation transformation matrices 
[45]. After perspective projection, following special cases are considered: 

• The region of interest degenerates to a line in the image plane when the region is too 
far from the camera, and 

• The region of interest in the image plane becomes very large when the edge is very 
close to the camera. 

A minimum width is assigned in the image plane in order to provide some search space for 
the feature detector in the former case. A maximum width is defined in the image plane 
to further restrict the region in the latter. In our experiment, the minimum and maximum 
widths of a region of interest were set to 10 and 20 pixels, respectively. 
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3.1.2 Defining Search Space for Horizon Line 

When the vertical angular field of view is larger than 26 , a horizon line appears in the 
image (see Fig. 17). The horizon is an important clue to estimating the camera orientation, 
since it gives the roll angle information directly. The expected position of the horizon line 
is easily computed using the camera geometry shown in Fig. 17. The associated region of 
interest is defined to be 20 pixels wide centered around the expected horizon line in the 
image. 

It is possible for the projection of the region of interest onto the image plane to be 
partially outside the image boundary. In such cases, we need to clip these regions so that 
the search space always remains within the confines of the image. This is done using a 
“polygon clip and fill” algorithm [41]. The regions of interest for both the runway and the 
horizon of the image sequence used in these experiments are shown in Fig. 18. 
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Figure 18: Region of interest 


3.2 Feature Localization and Object Detection 

3.2.1 Runway Localization 

The system searches for the expected features within the region of interest, as defined by the 
previous algorithm. This will significantly reduce search time and also avoid the spurious 
response which is likely in such a low resolution input image. An accurate localization is 
necessary to estimate the motion parameters and camera pose. 

A sobel edge detector is applied to the sensor image. We then select one of the four 
scanning directions (-45°, 0°, 45°, 90°) which is approximately orthogonal to the direction 
of the expected edge. Along each scan line, we locate pixels with greatest edge strength. 
Since the runway edge is supposed to be a straight line, we fit a best line to these pixels. 
We also associate a measure of confidence for these detected edges based on the number of 
edge pixels detected along the line. 

3.2.2 Object Detection 

In this section, the region inside and outside the r unway /taxiways are checked separately for 
the existence of stationary or moving objects. The image has three homogeneous regions: 
sky, the runway/taxiways, and the region outside the runway/taxiway. Any objects on or 


52 




outside of the runway/ taxi way are expected to have some deviation in gray level from their 
respective homogeneous background. Therefore, we use histogram- based thresholding for 
object detection. The thresholds which determine this deviation axe set to be different for 
different regions. 

We generate a mask image, which generates three homogeneous regions. Using this mask 
image, we generate the histogram of the original input image and compute its standard 
deviation for each region. The threshold value is determined as a function of the mean 
and the standard deviations. Any area which has gray level lower than the threshold is 
considered to be an object region. An object is assumed to have reasonable size. This 
size restriction on the object can be used to ignore spurious responses resulting from the 
thresholding. Each object is labeled based on the 4-connectivity. 

3.3 Experimental Results 

We tested our algorithm on a test image provided by the NASA Langley Research Center. 
This image was obtained using a single pixel camera located at a fixed point in space. (A 
camera with an array of pixels was under development when this work was carried out.) 
The camera was mechanically scanned to obtain a 50 x 150 pixel image. This is the image 
shown in Fig. 16 . We were also provided with a model of the runway which gave the 3-D 
world coordinates of the runway corners, locations of the buildings, etc. Using these data 
and the single image, we created a sequence of 30 frames to simulate the images from a 
moving camera. Frames l(original), 5, 10 and 15 from this sequence are shown in Fig. 19. 
Edge-enhanced images corresponding to these frames are shown in Fig. 20. The regions of 
interest defined on these frames are shown in Fig. 21. Delineated features superimposed 
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Figure 19: Input PMMW images (a)Frame 0 (b)Frame 5 (c)Frame 10 (d)Frame 15 


on the image are shown in Fig. 22. Objects detected on the runway in Frame 1 and those 
outside the runway are shown in Fig. 23. Warning signals could be generated for each object 
on or near the runway. 

3.4 Summary 

In this section, we described a model-based system for recognition of objects in low resolu- 
tion PMMW image sequences. Using knowledge of the camera motion, the runway model, 
and error in the parameters, the regions of interest for each of the runway edges are defined. 
We applied least square fit on strong edge pixels detected along the direction approximately 
orthogonal to the expected edge, to localize the runway feature within the search region. 
We tested our algorithm on a sequence of 30 frames and good results were obtained. The 
performance of the algorithm depend on the correctness of the search region and the robust- 
ness of the feature localization approach. The correctness of the search region depends on 
the knowledge of error in the camera pose and the runway model which, in turn, depends on 
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Figure 20: Edge images (a)Frame 0 (b)Frame 5 (c)Frame 10 (d)Frame 15 
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Figure 22: Detected runway (a)Frame 0 (b)Frame 5 (c)Frame 10 (d)Frame 15 







(a) (b) 

Figure 23: Detected object (a)Inside the runway (b)Outside the runway 
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the accuracy of the position estimate given by the GPS and INS. However, the robustness 
of the feature localization procedure depends on performance of the edge detector and the 
actual localization method. To make the method more robust, one can employ one or more 
of the following additional processing steps, however, at the cost of additional computation 
time: (1) consider strong edge pixels in each of the eight directions for least square fitting, 
(2) use a weighting function based on the gradient strength, (3) detect and remove outliers 
by computing the deviation of the edge pixels from the fitted runway edge. 

A modified version of the runway detection algorithm is described in [119]. It was tested 
using a video image sequence obtained from a sensor mounted on-board a landing aircraft. 
Our results suggest that a model-based approach for detection of runways performs well, 
in cases of both PMMW and video sensors, and can be used to aid pilots in identifying the 
runway during landing. 

The obstacle detection algorithm used in this work is based on the assumption that 
the regions are homogeneous. This assumption is valid for the PMMW image used in this 
experiment, and has also produced good results in all images of the sequence. On the 
other hand, it is a very weak assumption. It is not valid for images of different modality, 
especially in cases of video images. We have also investigated the real time implementation 
feasibility of various algorithms described in this section using available dedicated image 
processing boards from three separate vendors (Data-cube, Data-translation, and Imaging 
Technology) [29]. Further research on the feasibility of a PMMW sensor in the design of a 
SYS could not be continued for this study due to the unavailability of the sensor array. 
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4 Multiple Motion Segmentation and Estimation 


A key to obstacle detection in Passive Millimeter Wave (PMMW) images is the homogeneity 
property of the different image regions that are identified by using the camera position 
information and the runway model. Since this assumption is not valid in the case of video 
images, a different method needs to be used to detect obstacles. 

An important cue for detecting obstacles in a sequence of images obtained from a moving 
camera is the apparent motion of the brightness pattern in the image. Due to the camera 
motion, though, the whole scene appears to be moving, with image regions due to the 
obstacles moving differently than the background. Depending on the height of the obstacles 
and the 3-D motion of the obstacles relative to the camera, the apparent motion of the 
corresponding image regions will be different. 

In this section, we describe a new segmentation algorithm for detecting obstacles in 
video image sequences using motion as the main cue. We assume that the background is 
planar or is piecewise planar and use a recursive motion-based segmentation algorithm to 
segment image into regions corresponding to different moving objects. Initially, to detect 
motion, optical flow is computed at corner-like feature points in the image. Optical flow 
vectors axe then grouped into regions, and motion parameters axe computed for each region 
using the new recursive motion-based segmentation and estimation algorithm. 

A brief review of the past research in the area of motion detection, segmentation, and 
estimation is given in the following section, followed by an introduction to the proposed 
approach. The details of the individual stages of the proposed motion segmentation and 
estimation method are described in detail in Sections 4.3 and 4.4. The results obtained 
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using both simulated and real images are also presented. 


4.1 Methodologies for Motion Detection and Estimation 

The relative motion between the objects in a scene and a camera gives rise to the apparent 
motion of objects in a sequence of images. This motion may be characterized by observing 
the apparent motion of a discrete set of features or brightness patterns in the images. Motion 
of objects in the scene can be derived by analyzing the motion of features or brightness 
patterns associated with the objects in the image sequence. Two distinct approaches have 
been developed for the computation of motion from image sequences. 

The first approach, known as the feature-based approach, is based on extracting a set of 
relatively sparse, but highly discriminatory, two-dimensional features in the images corre- 
sponding to three-dimensional object features, such as corners, lines corresponding to edges 
demarcating the surfaces of the object in 3-D, etc. Such points and/or lines are extracted 
from each image, and inter-frame correspondence is then established between these fea- 
tures. Constraints are formulated based on assumptions such as rigid body motion i.e, the 
3-D distance between two features on a rigid body remains the same after object/camera 
motion. Such constraints usually result in a system of non-linear equations. The observed 
displacement of the 2-D image features are used to solve these equations, leading ultimately 
to the computation of motion parameters for objects in the scene. 

The other approach involves computing the two-dimensional field of instantaneous ve- 
locities of brightness values in the image plane or optical flow. A relatively dense flow field 
is estimated, usually at every pixel in the image. The optical flow is then used in con- 
junction with added constraints or information regarding the scene to compute the actual 


59 



three-dimensional relative velocities between scene objects and camera. 

4.1.1 Feature-based Motion Estimation 

In this approach, point and line features are extracted from each image and inter-frame 
correspondence over two or more frames is established. Equations relating the relative 
position and motion between the camera and the imaged scene to the feature motion in 
the image axe developed and solved to estimate the structure and motion. Many linear and 
non-linear methods for computing the structure and motion from line and point features 
are reported in the literature [6, 15, 16, 25, 26, 27, 28, 36, 37, 38, 50, 62, 63, 67, 68, 71, 72, 
79, 89, 91, 92, 103, 104, 105, 106, 123, 126, 127, 128, 129, 131, 133]. 

Non-linear methods iteratively solve non-linear equations derived to relate 3-D motion 
parameters with the observables in the image plane as in [27, 28, 36, 62, 106], whereas 
methods using linear algorithms provides closed-form solutions to the motion parameters 
[37, 38, 63, 67, 68, 71, 72, 88, 123, 126, 128, 129]. The challenge in the case of non-linear 
methods is to solve non-linear equations in the form of objective functions, to minimize the 
error between the observed and the predicted motion. 

Although various numerical methods could be applied to these non-linear equations, 
the global minimum may not be reached unless the initial guess is sufficiently close to the 
true value [28, 38, 128]. In addition a non-linear approach is computationally expensive, 
due to the exhaustive search of the solution space. Although linear algorithms use differ- 
ent methods to determine the unknowns, they share the same key structure: determine 
the intermediate parameters, which are called the essential parameters based on the epipo- 
lar constraint, then compute the motion parameters from the essential parameters. Even 
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though the solution to a linear system is guaranteed (except for degenerate cases), the 
solution is highly sensitive to noise [37, 38, 123]. 

To improve the accuracy of estimation, researchers have proposed using either more data 
points than the required minimum in a two-view algorithm, or use data from a sequence 
of images [6, 15, 16, 27, 28, 91, 106]. Algorithms that use multiple images of a rigid 
scene to produce a more accurate reconstruction either process data from all of the images 
simultaneously, as in the case of batch algorithms [27, 28, 91, 104, 106, 126, 131], or update 
the previous estimate using the measurement data from the new frame by maintaining 
some notion of state using an extended Kalman filter [6, 15, 16]. An optimal estimation 
procedure to overcome the noise sensitivity of linear algorithms and inaccuracy of non-linear 
algorithms due to an incorrect initial guess is suggested in [28, 127]. This is a non- linear 
optimization procedure, where the initial value for the non-linear algorithm is provided by 
a conventional linear algorithm. 

A few researchers have considered the problem of reconstructing scenes using straight 
line segments [25, 26, 37, 50, 67, 68, 103, 104, 129, 133]. When lines are used as features, 
two views are no longer sufficient and a minimum of three views are required [37]. This is 
because 3-D lines possess an additional degree of freedom when compared to the 3-D points. 
In other words, one can slide a 3-D line along itself and obtain the same line. When the lines 
are on the same plane, however, a linear algorithm can be formulated from two views [114]. 
Four or more lines are required to uniquely estimate the motion parameters. The non-linear 
algorithm described in [67] requires six line correspondences over three frames. The linear 
algorithm described in [68] requires 13 line correspondences over the same number of frames. 
In reality, the features detected could be sparse and, as a result, reliable estimation cannot 
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be achieved using a single cue. An integrated approach for motion and structure estimation 
is described in [114], which integrates the measurement from various features such as points, 
lines, regions, and texture using a non-linear optimization procedure. 

4.1.2 Optic Flow-based Motion Estimation 

In this approach, the instantaneous changes in image brightness values in the image are 
analyzed to yield a dense velocity map called the image flow or the optical flow. The 
three dimensional motion and structure parameters are then computed based on various 
assumptions and/or additional information. No correspondence between features in suc- 
cessive images is required. The optical flow technique relies on local spatial and temporal 
derivatives of image brightness values. 

The image velocities are, in general, functions of the motion of viewed objects relative to 
the camera, objects’ locations in 3-D space and 3-D structure of the objects. The recovery 
of the 3-D motion and structure information from the sequence of monocular images can 
be decomposed into two steps: 1) compute image plane velocities from changes in image 
intensity values, and 2) use optical flow to compute 3-D motion and structure. Since the 
optical flow constraint equation relating the optical flow to the spatial and temporal image 
intensity gradient is not sufficient by itself to specify the optical flow uniquely, additional 
assumptions are made to constrain the solution. Popular assumptions include: 1) optical 
flow is smooth and neighboring points have similar velocities, 2) optical flow is constant 
over an entire segment of the image, and 3) optical flow is the result of restricted motion 
for example, planar motion. 

Horn and Schunck [53] combined the gradient constraint equation with a global smooth- 
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ness term to constrain the estimated velocity field by minimizing the error over a domain 
of interest. Lucas and Kanade [74] implemented a weighted least squares fit of local first- 
order constraint to a constant model for optical flow in a small spatial neighborhood. Nagel 
[82, 83, 84] used the second-order derivative to measure the optical flow along with the 
oriented-smoothness constraint, in which smoothness is not imposed across steep intensity 
gradients. Since accurate numerical differentiation may be impractical (due to either noise, 
a small number of frames, or because of aliasing in the image acquisition process), region- 
based matching methods are used for computing optical flow in [3, 94]. Such approaches 
define the image velocity as the shift that yields the best fit between image regions at dif- 
ferent times. Finding the best match amounts to maximizing a similarity measure such as 
normalized correlation [94] or minimizing a distance measure such as the sum of squared 
difference [3]. 

Fleet and Jepson [40] investigated the extraction of motion information using Fourier 
techniques. Their method defines the component velocity in terms of the instantaneous 
motion of level phase contours (zero crossing in Difference of Gaussian or Laplacian of 
Gaussian image is viewed as level-phase crossings) in the output of band-pass velocity- 
tuned filters. Given the component velocity estimates from the different filter channels, a 
linear velocity model is fit to each local region. Heeger [52] used 3-D Gabor filters tuned to 
different spatiotemporal frequency bands and described a method for combining the outputs 
of the filters to compute local velocity vectors. 

Having computed the optical flow, there still remains the problem of computing the 
motion and structure of the object in 3-D space. The estimation of structure and motion 
is based on the key assumption that the optical flow varies smoothly and the surface of the 
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object is smooth. A system of equations that relates the optical flow and its first and second- 
order derivatives to the 3-D structure and motion parameters is derived. This non-linear 
over-determined system of equation may or may not yield a unique solution. A detailed 
analysis of numerous cases has been presented by Subbarao [112] and Waxman [124, 125], 
who have derived closed-form solutions for these cases. Subbarao shows that the solution, 
in general, is unique and at most four solutions are possible in certain cases. Neghadaripour 
[86] also addressed the ambiguity in interpreting optical flow produced by curved surfaces 
in motion. The ambiguities inherent in interpreting noisy flow fields are discussed by Adiv 
[ 2 ]- 


4.1.3 Multiple Motion Segmentation and Estimation 

The general paradigm for time- varying imagery analysis in cases of feature-based approach 
uses feature extraction, feature motion, and motion parameter estimation. In the formalism 
of optical flow, the first two steps are merged in the computation of optical flow. The works 
mentioned in the above review assume that the feature points, or optical flow vectors, result 
from a single rigid object in motion. The main foci of such work are the minimum number of 
features required to compute a solution, the possibility of multiple solutions, and the effect 
of noise. If an image contains two or more objects moving independently, a segmentation 
procedure becomes necessary before any estimation can be done. In the optical flow method, 
this consists of grouping pixels corresponding to distinct objects into separate regions (i.e., 
segmenting the optic flow map, and then computing the three-dimensional coordinates of 
surface points in the scene corresponding to each pixel in the image at which the flow 
is computed). In feature-based analysis, the computing structure corresponds to forming 
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groups of image features for each object in the scene and then computing the 3-D coordinates 
of the object feature associated with each image feature. 

In autonomous navigation, it is essential to obtain such a 3-D description of the static 
environment in which the vehicle is traveling, and to estimate the range from an obstacle to 
the cam era, in order to avoid collision by changing the vehicle’s nominal path. The methods 
for range estimation described in [7, 22, 95, 96, 97, 107, 108, 110, 115] used Kalman filter 
to recursively estimate the range at feature points in the image. The range estimation 
procedure described in [118] used a simple incremental weighted least squares method for 
estimating the 3-D position of a stationary object using known camera state parameters. 
T his algorithm extended the epipolar plane image analysis described in [8, 14] to general 
camera motion by assuming the camera motion to be piecewise linear. Adiv segmented 
the optical flow based on fit to an affine model [2]. The algorithm further grouped the 
resulting regions to fit a model of a planar surface undergoing 3-D motions in perspective 
projection. In [78], optical flow associated with the contour chain points is computed and 
chain points are grouped based on the spatial proximity and coherency of the apparent 
movement. Thompson and Pong [121] and Nelson [87] detected moving objects on the basis 
of simple flow clustering or inconsistency with the background flow. Black [12] described 
an approach that formulates a model of surface patches in terms of constraints on intensity 
and motion, while accounting for discontinuities. An incremental minimization scheme is 
used to segment the scene over a sequence of images. 
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4.2 Problem Statement and the Proposed Approach 


The focus of this work is to detect obstacles in video image sequences obtained from a 
moving camera. In general, an obstacle is defined as an object, either stationary or moving, 
which is in the path of the moving camera. We assume that the background is planar or is 
piecewise planar. We are interested in detecting ground-based obstacles. 

Since the camera is moving resulting in image motion everywhere on the image plane, 
simple image differencing approaches (such as those described in [51, 58, 57]) are not ap- 
plicable. Thompson and Pong [121] have shown by example that, different motion and 
structure situation can result in identical optical flow distribution and hence, it is impos- 
sible to derive motion or structure information uniquely, given only the image and the 
optical flow field. If the camera motion and the background model axe available, methods 
using warping (such as the one described in [20, 44, 115]) can be used to detect obstacles. 
Warping parameters are computed by matching image windows in two frames (as in [20]) 
or using the known camera motion and plane parameters (as in [44, 115]). Obstacles are 
detected by thresholding the residual optical flow computed using the warped images. If 
nothing is known about the camera motion or the background, approximate motion models 
are hypothesized and are verified for individual flow vectors or for a group of flow vectors 
as in [1]. 

In [1] an affine motion model with six parameters is used to describe the optical flow at a 
pixel in the image plane. A six dimensional Hough space is formed. Each flow vector votes 
to one or more bins of this Hough space, depending on the error between the model flow 
(computed using the affine motion model) and the actual flow (computed at the pixel using 
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an optical flow algorithm). The success of the Hough method relies on prior knowledge 
about the range of parameter values. The accuracy of the method relies on the resolution 
of the Hough space. 

In our approach, we assume that the background (i.e., the runway) is planar or is 
piecewise planar, and we use a planar motion model to describe the image motion. Even in 
cases where the background is not planar, the assumption is valid if the ratio of distance to 
the scene point from the camera to the variation in the depth of scene points is large. Based 
on this model, we develop a least square model fitting algorithm to segment the image and 
detect the obstacles. Unlike the Hough method, we do not have to rely on knowledge of 
the range of the model parameters. Instead, we estimate the parameters from the available 
data and refine the estimate recursively by removing the outliers. 

Optical flow is computed using Lucas and Kanade’s algorithm [74], and using Simon- 
celli’s [93] modification to compute the covariance of the estimated optical flow. Unlike 
the conventional approach, which theoretically computes the optical flow at every pixel, we 
compute the optical flow only at selected feature points. Since the optical flow is large, and 
to avoid errors due to temporal aliasing, we computed the optical flow using hierarchical 
computation method and Gaussian pyramid. The details of the optical flow computation 
are given in the Section 4.3. The planar motion model and the segmentation algorithm to 
segment single and multiple motion axe described in Section 4.4. 

Various stages of our motion-based segmentation algorithm were tested using both real 
and synthetic image sequences provided by the NASA Ames Research Center. Frames 0, 25, 
50, and 75 of the 90-frame real image sequence, runway.crossing.new (obtained by a camera 
mounted on-board a landing aircraft with a truck moving across the runway) are shown in 
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Fig. 24. In this imaging situation, the aircraft approaches the runway at an average velocity 
of (60.0, 3.0 8.0) feet/second, and the truck is moving across the runway at an average 
velocity of (13.0, 20.0, 2.0) 4 feet /second. The distance to the truck from the camera during 
the flight duration of approximately three seconds ranges from 500-370 feet. In addition to 
the truck and the runway lines, tire marks can be seen very clearly on the image. These 
tire marks are used as additional features for estimating the runway plane parameters and 
the detection of obstacles using motion-based segmentation. Synthetic images are created 
using a simulation software provided by the NASA Ames Research Center. Fig. 25 shows 
frames 1, 50, 100 and 150 of the sequence, landing.normal.S2L, obtained using this software. 
The image sequence (which was originally in color) was converted to gray scale and the 
experiments were conducted on the gray scale image. These images were simulated for the 
case of a c am era assumed to be mounted on-board a landing aircraft while another aircraft 
was crossing the runway. The image sequence was simulated using models for runway, 
aircraft, and textures for the runways, taxi ways, sky, etc. 

4.3 Optical Flow Computation 

Many algorithms for computing optical flow have been described in the literature. (These 
were briefly introduced in the previous section.) Barron et al.[10] has classified these algo- 
rithms into four classes: gradient-based, region-based, phase-based, and energy-based, and 
has reported a good, quantitative evaluation of performance of various existing algorithms 
in each of these four classes. Despite their differences, many of these techniques can be 

4 The nonzero velocity of the truck in the vertical direction is due to the slope of the runway with respect 
to the reference coordinate system. 
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(C) (d) 

Figure 24: Real image sequence: runway.crossing.new (a)Frame 0 (b)Frame 25 (c)Frame 
50 (d)Frame 75 
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(c) (d) 

Figure 25: Synthetic image sequence: landing.S2L-normal (a)Frame 1 (b)Frame 50 (c)Frame 
100 (d) Frame 150 
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viewed conceptually in terms of following three stages of processing. 

• Prefiltering or smoothing with low-pass/band-pass filters, in order to extract signal 
structures of interest and enhance the signal-to-noise ratio, 

• Extraction of basic measurements, such as spatiotemporal derivatives (to measure 
normal components of velocity) or local correlation surfaces, and 

• Integration of these measurements to produce a 2-D flow field, which often involves 
assumptions about the smoothness of the underlying flow field. 

According to Barron’s survey. Lucas and Kanade’s method [74] provides estimation to 
sub-pixel accuracy and performs most consistently and reliably over all their test images. 
Due to these reasons we used the Lucas and Kanade’s algorithm but with a modification 
suggested by Simoncelli et al. [93] that uses a statistical model to account for noise. The 
method not only provides the estimates of optical flow, but also their covariances. The 
algorithm requires at least five frames, and optical flow is computed for the central frame. 
Time and space gradients are found at positions of interest, and are used to estimate the 

optical flow. 

Let J(x, y, t ) be the image intensity signal as a function of position of a pixel (x, y) in the 
image plane and time t. Assuming that there is no change in illumination, the brightness of 
a particular point in the pattern remains constant. Hence, the total derivative of the image 
intensity function must be zero at each position in the image at every time. This results in 
the standard gradient formulation of the optical flow problem written as 

I a • u + It = 0 (23) 


71 



where 



(24) 


and I x , I y and I t are the spatial and temporal derivatives of the image J, and u = (u,v) 
is the optical flow at position (x, y) and time t that the derivatives have been computed. 
Because the constraint is formulated only in terms of the first derivative, image intensity is 


implicitly approximated as a planar function. 

The squared error function based on the derivative constraint can be written as 


E(u) = [l s -u + I t \ 


(25) 


Linear Least-Squares Estimate (LLSE) of u as a function of I s and I t is computed by setting 
the gradient of this equation to the zero vector: 


A E (u) = M * u + b = 0 


(26) 


where 


M = f 3 IJ = 




( \ 

1.1, 

,b = 

I x It 

IT f 2 

y I y j 


\ Ivlt / 


(27) 


The matrix M is always singular, since the solution is based on a planar approximation to 
the image surface at a point and, therefore, suffers from the aperture problem. Eq. 23 only 
places constraint on the velocity vector in the direction of I s ] that is, on the component of 
flow in the direction of the gradient. 
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In order to eliminate the singularity problem, one can assume that the velocity vector 
is constant in a local region. This can be done by writing an error function based on the 
normal constraints from each point within a patch, indexed by a subscript i € {l,2,3,...n}. 

e («) = 53 [£ ( Xi > y*> “ + h fa, yi . tj\ ( 28 ) 

i 

Computing the gradient of this expression with respect to u gives: 

A u E(v) = 53 (Mi -u + fy (29) 

i 

with solution 

fi=-(x>‘) '(zs) < 30 > 

where 

Mi~ M (xi, yut),bi = b ( Xi , yu t) 

as in Eq. 27. Lucas and Kanade’s approach [74] based on a Taylor series approximation 
to the solution of a matching problem in the context of stereo vision results in a solution 
identical to the equation 30. A weighting function Wi is also included in the summation, in 
order to emphasize the information closer to the center of the averaging region. 

Although the above gradient-based method can provide estimates over a greater area 
than those obtained by feature matching, there will be errors in the estimated flow due 
to errors in the derivative computation, aliasing, imprecision in the derivative filters, etc. 
Even if the derivative measurements are error-free, the constraint in Eq. 23 may fail to be 
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satisfied because of changes in lighting or reflectance, or the presence of multiple motions. 
Optical flow computation can be viewed as an estimation problem resulting in a probabilis- 
tic framework that allows these uncertainties to be represented in the computation [93]. 
Uncertainties computed in the form of a covariance matrix can be used as a weighting 
function in the interpretation stage. 

Consider the total derivative constraint in Eq. 23. Let T]\ and 772 be independent additive 
Gaussian noise terms describing the errors resulting from a failure of planarity assumption 
and errors in the temporal derivative measurements. Then Eq. 23 can be rewritten as 

I s -(u- rft) + It = m, rn~ N (0, Ej) 


or 


I s ■ U + It — Is ■ 'Hi + Tfi 


(31) 


Eq. 31 describes the conditional probability, P {it The desired conditional 

probability can be written using the Bayes’ rule as 


, _ P(l t \u,I s )-P(u) 

(u- I /„/,)= ^ 


(32) 


The prior distribution P ( u ) is chosen to be a zero-mean Gaussian with covariance £ p . The 
mean fi a and the covariance £3 may be derived using standard techniques [33]: 


Us = -E„-/;(4 , S 1 />£ 2 ) 1 It 
£a = [is + ^ 2 ) if + 1 


(33) 
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If Ei is chosen to be a diagonal matrix with diagonal entry o\ and the scalar variance of 772 
is ct 2 = E 2 , then the mean and variance of the estimated optical flow can be written as 


S . 6 

“ (rxi || 4 |P +CX2) 

5 I 
(<7l || Is II 2 +^ 2 ) T 


(34) 


Since the distribution is Gaussian, the Maximum Likelihood Estimation (MLE) is simply 
the mean, na- The optical flow is computed at selected feature positions detected using a 
feature detection algorithm described in the following section. 


4.3.1 Feature Detection 

Flow fields generated by existing algorithms are noisy and partially incorrect. Algorithms 
for interpreting these fields fail under such conditions. Although gradient-based techniques 
can produce a dense optical flow field, the estimated optical flow is not reliable at regions 
of low contrast, since the gradient is small, and at edge points, due to the aperture effect. 
But full optical flow can be determined at corners. Hence, we compute the optical flow 
only at corner-like feature points, unlike the conventional optical flow algorithms, which 
theoretically compute the optical flow at every pixel in the image plane. We do not use the 
matching approach to compute the disparity; instead, a gradient-based method is applied 
to the local region centered around the pixel. 

Conventional feature-based methods for computing the optical flow identify distinct 
2-D features on the image, then an inter-frame correspondence is performed to compute 
the image disparity at each feature point. It has been shown that, at any given pixel, 
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theoretically, the process of computing the optical flow using region matching by Sum of 
Squared Difference (SSD) of image intensity is identical to computing the optical flow using 
a gradient-based approach. Optical flow methods proposed by Anandan [3] and Singh [94] 
compute optical flow using region matching approaches. These approaches can be viewed 
as feature-based. But, features are not explicitly identified using any criterion such as 
variance measure, interestingness or corner measure etc. Instead a local region at every 
pixel is identified as an interesting region. In this work, we integrated both feature-based 
and field-based approaches. Unlike Anandan’s approach, we computed the optical flow only 
at distinct feature points where optical flow computation is expected to be reliable. Instead 
of region-based matching, we use the gradient-based approach applied to the local region 
surrounding the feature point. 

Barnard and Thompson [9], in their stereo disparity analysis, used points with high 
variance in all four directions as features computed using the Moravec interest operator 
[80]. Meygret and Thonnat [78] used contour chain points to compute the optical flow 
using matching. Regions of size 11x11 pixel with high variance are used as features in [95] 
to compute range to scene points using a Kalman filter-based recursive approach. We used 
the SUSAN (Smallest Univalue Segment Assimilating Nucleus) corner detector proposed by 
Smith [99]. This corner operator has demonstrated better performance than other corner 
detectors [98], and has also been used successfully to detect and track feature motion [100], 
A brief description of the SUSAN corner detector is given below. 

Smith and Brady [98, 100] have used a new approach known as the SUSAN principle to 
locate edges and corners in an image. Consider a rectangular box, as shown in Fig. 26(a). 
A circular mask is shown at seven different positions in the image. The central pixel of the 
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mask is known as the nucleus. The area of the mask having the same (or similar) intensity as 
the intensity of the nucleus is known as “USAN” (Univalue Segment Assimilating Nucleus). 
In Fig. 26(b), the USAN is shown in white. Note that the USAN area is largest in the 
uniform portions of the image, smaller in the edge areas, and smallest near the corners of 
the image. Using this principle, one can locate the edges and corners of an image by taking 
local minima of USAN. Since the minimum is taken, the principle is known as SUSAN or 
Smallest Univalue Segment Assimilating Nucleus. 

The SUSAN corner detector was applied to video image sequences captured from an 
aircraft. Fig. 27 shows zoomed part of frame number 50 of the image sequence run- 
way-crossing.new. The output of the corner detector is shown in Fig. 28. The detector 
detects corner-like features wherever there is intensity variation in at least two directions. 
It can be seen that not only do the features correspond to the moving truck, but extraneous 
features such as tire-marks are detected by the corner detector. 

4,3.2 Hierarchical Approach for Optical Flow Computation 

The gradient-based approach for optical flow computation will fail if there is too much 
spatiotemporal aliasing (i.e., if the displacement being measured is greater than one half 
of a cycle of the highest spatial frequency present in the pre-filtered image sequence). In 
the case of a feature-based approach, the matching process that must accommodate large 
displacements can be very expensive to compute because of the large search space. Simple 
intuition suggests that, if large displacements can be computed using low resolution image, 
great savings in computation will be achieved. Higher resolution image can then be used 
to improve the accuracy of the displacement estimate by incrementally estimating small 
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Figure 26: SUS 
shown in white 




(b) 

f principle (a) Circular masks at different places on the image (b) US AN 
or. 


78 








Figure 27: 


Zoomed part of frame 50 in the image sequence runway .crossing-new. 



Figure 28: Detected corner-like features superimposed on the zoomed part of frame 50 in 
the image sequence runway.crossing.new. 
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displacements. It can also be argued that it is not only efficient to ignore the high resolu- 
tion image information when computing large displacements, it is, in fact, necessary to do 
so. This is due to aliasing of high spatial frequency components undergoing large motion. 
Aliasing is the source of false matches in correspondence solutions or (equivalently) local 
minima in the objective function used for minimization in optical flow computation. Hence, 
matching or minimization in a multi-resolution framework helps eliminate problems of this 
type. 

Hierarchical approaches have been used by various researchers in the past. Bergan et al. 
[11] described a hierarchical estimation framework for the computation of different model- 
based representations of motion information. Meygret and Thonnat [78] described an optical 
flow estimation algorithm based on a multi-resolution technique for matching contour chain 
points. A multi-grid approach for computing optical flow is described in [34]. Anandan [3] 
used large-scale intensity images to obtain rough estimates of image motion, which were 
then refined using intensity information at smaller scales. Optical flow is computed at every 
pixel in the image using region-based matching. We compute the flow only at feature points 
detected using the SUSAN corner detector in the high resolution image. 

Fig. 29 describes the hierarchical optical flow computation framework used in this work. 
The basic components of this framework are: 

• pyramid construction, 

• optical flow computation, 

• image warping, and 

There are a number of ways to construct the image pyramids [90]. We built a Gaussian 
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Figure 29: Diagram of the hierarchical optical flow estimation framework 
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pyramid on each frame of the image sequence [18, 90]. The Gaussian pyramid is a sequence 
of images in which each is a low-pass filtered copy of its predecessor. Let be the i t ^ frame 
of the original image sequence; this becomes the bottom or zero level, Lo of the pyramid. 
Let us represent this image as if. Each pixel of the Gaussian pyramid image at the next 
pyramid level L\ y image //, is obtained as a weighted average of pixels from if within a 
5x5 window. Each pixel of if at pyramid level L<i is then obtained by the same pattern of 
weights from l\ at level L\. The sample distance is doubled at each level. As a result, each 
image in the sequence is represented by an array which is half as large as its predecessor. 
The process which generates each image from its predecessor is called a REDUCE operation, 
since both resolution and sample density are decreased. Let if be the original image and 
let if 1 be the image at the top level N of the pyramid. Then for 0 < / < N we have 

2 2 

I l i(t,j)- 5Z ^ w{m,n)I l i ~ 1 {2i + m,2j + n) (35) 

m=— 2 n=— 2 

The weighting function w is called the generating kernel which is separable, normalized, and 
symmetric. The one dimensional kernel is given by w — [c b a b c] such that a + 2c = 2b. 
Combining these constraints we have: a = free variable, b = and c = j 

Optical flow is computed using the gradient-based approach explained in the previous 
section. This method requires a minimum of five image frames; optical flow is computed 
for the center frame. Let be the image frames used to compute the opticad flow where 
i = 1,2, 3,4, 5. Initially, Gaussian pyramid images au"e computed for each frame. We 
have computed a Gaussian pyramid up to 3 levels (Lo,Li,L 2 )i where level Lq represents 
the original image. Features axe detected in frame I 3 at level Lq , and their positions axe 
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predicted for level Z,2- Optical flow is computed for each feature at level L 2 using the 
Gaussian pyramid images at that level. Let uf be the optical flow computed for a feature 
Fi at level L 2 . 

The optical flow computed at level L 2 and the feature position are upsampled to level 
L\. Using the upsampled optical flow, the pyramid images at level L\ are warped. The 
warping operation is defined as 

Iwarpedi^tV) = Ioriginal( x ~ u ( a: i y)i V V ( X >P)) (36) 

However, warping is done only for a local region around the position defined by the above 
relation. We use bilinear interpolation to evaluate the /original a t fractional-pixel locations. 
The optical flow is computed using the warped images, resulting in “optical flow correction” 
6u} to be composed with the original estimate, to give a new optical flow estimate u-. The 
optical flow at level L x after correction is given by: 

u] = + Su] (37) 

The process of propagation of flow values and computation of flow correction is repeated 
at each level of the pyramid until the flow fields are at the resolution of the original image 
sequence. 

We applied our multi-resolution optical flow computation algorithm on the image se- 
quence runway-crossing.new. The Gaussian pyramid images and the optical flow obtained 
using our hierarchical framework for this sequence are shown in Fig. 30. 
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(c) (d) 

Figure 30: Gaussian Pyramid images: (a)LevelO image (b)Levell image (c)Level2 image 
(d) Optical flow at LevelO 
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4.4 Motion-based Segmentation 


Various approaches for motion estimation based on point or line features assume that the 
features undergo similar motion or features belong to a single object in motion. Optical flow 
computed using either the field-based or the feature-based approach computes the apparent 
image motion without the knowledge of the type of motion or the number of objects in 
the scene being viewed. Hence, before any estimation can be done using these optical flow 
vectors, flow vectors belonging to different objects and the background need to be grouped 
to identify image regions corresponding to the object /background. 

In the context of obstacle detection, the problem would be to distinguish the obstacle 
from the background. If the camera is also moving, then background motion will be the 
dominant motion. If the camera motion and the model of the terrain in which the camera 
is moving are available, image warping can be used to compensate for the ego-motion of the 
camera and identify the obstacles. This is the approach taken in [20, 44, 115]. If nothing 
is known about the camera motion, then optical flow vectors are grouped based on some 
approximate motion model to form object hypothesis, as in [1], The hypothesis is then 
verified by computing the deviation of the actual optical flow from the hypothesized model 
flow. 

We use the motion model for segmenting the video image sequence where the background 
(i.e., the runway, in our application) is assumed to be planar. This is an approximate model 
used to hypothesize the background region. Even if the background is not perfectly planar, 
a planar motion model is valid in situations where the ratio of the range to the scene point 
to the variation in the range to other scene points is very small. As the camera approaches 
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the runway, the planar motion assumption may not be valid. We still use the planar motion 
model to segment the image by assuming that the background is piecewise planar and 
applying the planar motion model to small local regions separately. In the next section, 
we derive the image velocity equation for the planar motion model case and describe an 
algorithm for recovering model parameters using a least square fit method. 

4.4.1 Planar Motion Model 

The imaging geometry and various coordinate systems used in this analysis are shown in 
Fig. 31. The earth reference coordinate axes X e ,Y e ,andZ e are rigidly affixed to the Earth 
at an arbitrarily selected but known point (on the runway). The helicopter body axes 
X b ,Y b ,andZ b are assumed to be located at the center of gravity of the helicopter, with the 
X b axis pointing forward and Z b axis pointing downward. The sensor coordinate system 
axes X s ,Y s ,andZ s are attached to the sensor, with the axis X s passing through the optical 
axis of the sensor and originating at the focal point of the sensor. In this work, we use 
“sensor coordinate system” and “camera coordinate system” interchangeably. The image 
plane is located at a distance / from the focal point of the sensor, and is perpendicular to 
the optical axis of the sensor. 

Let (X, Y, Z) be the 3D coordinates of a stationary point P in the sensor coordinate 
system. The projection of the point P in the image plane is given by 

x = a!^,v = Sj (38) 

where A is the aspect ratio. As the camera moves, the image point of P will also move. 
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Figure 31: Imaging geometry 
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Differentiating the above perspective equation with respect to time, we have the image 
velocity u = (u, v) 


u 

v 


dx a XY-YX 
dt= Af x> 
dy _ XZ-ZX 
dt~ f X 2 


(39) 


Let p = (X, Y, Z) be the position vector for the object point P in the sensor coordinate 
system. If point P is assumed to be fixed in the Earth frame, the rate of change of p in the 
camera’s axes system can be determined using Coriolis’ equation [72]: 
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= — T — R x p 


\ Z I 

'-V ; ^ / 
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-V Z 
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— Zu)y + Y u?z 
—Xu)z + Zu>x 
^ —Yu>x + XuJy J 


(40) 


where T = (Vx, Vy, Vz) and R = (u>x,vy, u )z) are the camera’s translational and rotational 
velocities. 

Substituting for p in Eq. 39, we obtain a well-known optical flow equation which relates 
the camera motion, object motion in the image, and object’s range: 


/ \ 
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(41) 
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where 



As is evident from the above equation, optical flow can be decomposed into two com- 
ponents - ( u t ,v t ) due to translation, and {uu,,v u ) due to rotation. Only the translational 
component of the image velocity is a function of the range. 

Let the point P be on a planar surface whose plane equation in the camera coordinate 
system is given by: 

k\X + k 2 Y + k 3 Z = 1 (43) 

Substituting for X and Y from the projective transformation equation, we have: 

ki + k 2 — + k 3 ^~ = — (44) 

Af f X 

Substituting for the y in the image velocity equation, we have the optical flow equation for 
the planar motion case as: 

u = <zi + a 2 x + a 3 y + ajx 2 + agxy 

v = 04 + a 3 x + agy + a»y 2 + ajxy (45) 
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where 


ai = 

-AkifVy - f Aw z 

a2 = 

kiV x -k 2 V Y 

a 3 = 

—k 3 AVy + Au>x 

a 4 — 

—kifVz + fwy 

a 5 — 

k 2 Vz wx 
A A 

as = 

kiV x - k 3 V z 

aj = 

k 2 Vx wz 
Af fA 

as = 

k 3 Vx , Wy 

/ + / 


From the above equation, it is clear that the instantaneous motion of a planar surface 
undergoing rigid motion can be described as a second order function of image coordinates 
involving eight parameters. The eight coefficients (ai, . . . ,as) are functions of the motion 
parameters and the the surface parameters. If the ego-motion parameters are known, then 
the three-parameter vector k can be used to represent the motion of the planar surface. 
Otherwise the eight-parameter representation can be used. In either case, the flow field is 
a linear function in the unknown parameters. Since our objective is to segment the image 
based on structure and motion, we used the eight-parameter representation for the flow 
field. 

In the following sections we describe a least square fitting algorithm for recovering 
the eight parameters from a set of optical flow vectors, followed by our new segmentation 
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algorithm based on this approach for segmenting multiple motions from optical flow field. 


4.4.2 Recovering Model Parameter from Optical Flow 

It is clear from the above analysis that each data point [i.e., optical flow (u, v) at pixel 
(a:, a/)] results in two equations containing eight unknown parameters. Hence, at least four 
points axe required to solve for model parameters, assuming that all four points belong to 
the same planar motion. If more than four points are available, then a Least Square Fit 
can be used to compute the best model which fits all the points. 

4.4.3 Least Square Model Fitting 

Let ( Ui,Vi ) be the optical flow at pixel (x{, y,). The image velocity equation can be written 
as: 
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The least square solution for computing the motion model parameter vector a is given by 


a = (X 1 X)~ 1 X T u 


(48) 


where 




v\ 
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1 X2 V2 0 0 0 x\ x 2 y 2 

0 0 0 1 X 2 y 2 X 2V2 vl 


,X = 


1 x n y n 0 0 0 


KnVn 
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If E u is the covariance of the computed optical flow, the least square solution for a is given 
by: 

a = (X T E- 1 X)- 1 X T E- 1 u (49) 


where the covariance of the estimated motion model parameter is: 


E a = (J^E - 1 *)" 1 


(50) 
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4.4.4 Detecting Outliers 


If the optical flow computed is exactly equal to the expected 2-D projection of the actual 3-D 
velocity, then the motion model parameters computed using the above least square method 
will be accurate. Since the optical flow is computed using the gradient approach, based on 
the flow constraint equation which is applied on the image locally, computed optical flow is 
only approximately equal to the 2-D projection of 3-D velocity. In addition, the computed 
optical flow will be noisy due to errors in computing the gradient, multiple motion, etc. 
Data points with large errors or flow vectors due to a different motion, called the outliers, 
can significantly bias the solution. Such points need to be identified and removed before 
reliable estimation can be done. 

Let (up(x,y),v p (x,y)) be the practical optical flow computed using the gradient-based 
approach. Let a f be the estimated motion model parameters using all the data points 

available. Then, the theoretical optical flow (ut(x, y)) at (x,y) according to the 

recovered model is given by: 

Ut(x , y) = a\ + a 2 x + a f z y + q! 7 x 2 + a f s xy 

v t (x, y) = a' 4 + a' 5 x + a' 6 y + a' 8 y 2 + a' 7 xy (51) 

The deviation of the point from the estimated model is computed as the deviation of the 
practical flow from the model (theoretical) flow given by the Mahalanobis distance: 

d = (u p - u t )S“ 1 (u p - u t ) T (52) 
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Points with d > d t axe considered to be outliers, where d t is a threshold value decided 
experimentally. 

4.4.5 Algorithm for Recovery of Single Motion Model 

Prom the above analysis, it is clear that under ideal situations where all flow vectors result 
from a single planar surface motion, a least square fit approach can be easily used to recover 
the motion model parameter. If there is no noise in the estimated optical flow the recovered 
model parameter will be identical to the actual parameters. If noise in the optical flow is 
small, then the least square estimate can provide a good estimate of the parameter vector. 
If there are few data points with large errors, then these points can have significant effect 
on the accuracy of the recovered model parameters. These noise data points, called the 
outliers, need to be detected and removed systematically before reliable estimates can be 
obtained. A simple iterative algorithm based on the least square fit method for recovering 
the motion model from a set of N noisy flow vectors is described in the following four steps. 

1. Compute the model parameter o! using N data points in the least square sense. Data 
points include the coordinate of a pixel in the image plane (r, y) and the optical flow 
(up,u p ) computed at that pixel using the gradient-based approach described earlier. 

2. For each data point, compute the theoretical optical flow (u t ,u e ) using the model 
parameter a! computed in the previous stage. 

3. Compute the Mahalanobis distance d between the practical flow and the theoretical 
model flow. If d is above a threshold d t , consider the data point as an outlier, else 
consider the point for refining the estimate in future iterations. 
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4. If there are no outliers detected in this iteration, the least square algorithm is said 
to converge and the model parameter obtained in this iteration is the best model 
parameter describing the planar surface motion. If one or more outliers are detected 
in this iteration, ignore these data points and repeat steps 1 — 4 for the remaining 
data points. 

4.4.6 Segmentation and Estimation of Multiple Motion 

The algorithm described above can be used only when a sufficient number of good points 
are available. The least square fit algorithm fails if more than 50% of the data points axe in 
error, which is the actual breakdown point for a least square method. This can happen for 
two reasons: 1) Majority of flow vectors, though are resulting from a single planar surface in 
motion, many of these vectors could be in error in addition to the few flow vectors resulting 
from a differently moving object. 2) The data points come from different objects moving 
in different directions. It is also possible that the scene may not be perfectly planar; thus, 
the optical flow vectors computed using an optical flow algorithm, although correct, may 
not follow the planar surface model. 

In [20], the plane parameters were first computed by matching a portion of a previous 
image containing the planar region to the current image. Model parameter thus recovered 
was used to warp the second image to the first. The residual optical flow was then computed 
using the warped image to detect obstacles. In [115], the runway was assumed to be planar. 
Using the known camera motion and the plane parameters, images were warped and residual 
flow was computed. Large residual flow vectors due to obstacles were removed. The residual 
flow at other pixels was assumed to be caused by errors in the initial model parameter and 
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was used to improve the model’s accuracy. In [44], warping parameters were computed using 
the known camera motion and plane parameters. Features were tracked by computing a 
moving average of the optical flow. Pixels with large residual flow were detected as obstacles. 

In this work, we considered a more general problem. We computed the model parameter 
vector from the available flow vectors, and identified the outliers in the set of data points by 
computing the deviation of the optical flow from the estimated motion model, as explained 
earlier. We assumed that the dominant motion was due to the planar surface in motion, 
and that the outliers were mainly due to obstacles. Our algorithm can be used for two 
applications: 

• By computing the dominant motion, obstacles are detected in the form of outliers. In 
addition to detecting outliers, our algorithm also provides the motion model parameter 
for the planar surface on which the object is moving. 

• P lanar motion model is a good assumption if the range to the scene is large compared 
to the focal length. As the camera approaches the ground, however, the background 
may not be perfectly planar, in which case piecewise planar assumption can be used. 

Although Adiv’s work had a similar objective [1], his study uses the affine motion model 
to describe motion at any point, and uses Hough space to detect the multiple motions. 
Each optical flow vector votes to a set of bins in the six-parameter Hough space based 
on the difference between the optical flow value computed using an optical flow algorithm 
[53] and the theoretical optical flow expected for the model defined by the six parameter 
values associated with the bin. All flow vectors from a single rigid object in motion are 
expected to vote into the same bin, resulting in clusters in the Hough space, where each 
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cluster corresponds to separate moving objects. The success of this method depends on the 
knowledge about the range of values for each of the parameters and the resolution of the 
parameter space. The higher the resolution, the more accurate the segmentation, although 
computationally it is going to be very expensive. In [122], an adaptive Hough approach 
is used where, initially, a low resolution Hough space is used. A higher resolution Hough 
space is then derived around the clusters detected in the first Hough space. 

4.4.7 Split and Merge Algorithm 

We assume that the runway or the terrain in which the camera is moving is either planar 
or is piecewise planar. We started by hypothesizing the runway to be perfectly planar, and 
applied planar motion model to all of the available flow vectors. If the initial hypothesis 
failed then we hypothesized the runway to be piecewise planar, and applied planar motion 
models to each of the pieces separately. The outline of the algorithm is shown in the block 
diagram in Fig. 32. 

The algorithm starts with the hypothesis that all flow vectors belong to a single planar 
surface. The model parameter is computed using the least square approach, and outliers 
are detected by computing the deviation of each of the points from the computed model. 
The model parameter is refined by iteratively removing the outliers from the initial set of 
flow vectors. The algorithm is said to converge when no more outliers are detected. If 
the algorithm converges with less than 50% outliers, then the computed plane parameter 
represents the ego-motion of the camera and the structure of the background. Note that u>, 
V, and k can be computed from a. The outliers are either due to noise or independently 
moving objects. If the outliers are distributed randomly, they can be simply ignored. If the 
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Figure 32: Split and merge algorithm 
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optical flow vectors are close together, then they can be considered to belong to a single 
object; a separate motion model parameters can be recovered from these sets of outliers. 

If the least square fit algorithm converges with more than 50% outliers, then the so- 
lution is completely dropped. The background/runway is assumed to be piecewise planar. 
Piecewise planarity can be applied in two ways. 

1 . The algorithm can be applied to non-overlapping local regions of size N x N over the 
entire image and model parameters can be computed for each of these local regions. 
The regions can be then merged based on the spatial constraint and the merging error. 
This is not practical in this study because it is not clear how the feature points are 
distributed on the image plane and, therefore, local regions may not be well defined 
or evenly distributed. 

2. We follow the split and merge algorithm used for intensity image segmentation. In this 
application, we split the image into four parts. The iterative least square algorithm 
is applied independently to each region. Where necessary, the regions were split 
recursively until the least square algorithm succeeds for a given region. Regions are 
then merged if the error after merging is below a threshold. 

4.4.8 Experimental Results 

The motion-based segmentation algorithm for detection of obstacles based on the planar 
motion model for the background/runway with the incorporation of the split and merge 
technique is tested using both simulated and real image sequences. Fig. 33(a) shows frame 
3 of an image sequence obtained using the NASA simulation software. Fig. 33(b) shows 
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the features detected in frame 3, and Fig. 33(c) shows the optical flow computed for the 
feature points using our hierarchical framework. Fig. 33(d) shows flow vectors violating 
the planar motion model constraint, which have been identified as potential obstacles. The 
plane crossing the runway is very clearly detected as an obstacle in the images. (Few false 
alarms are due to noise in the estimated optical flow.) 

Fig. 34(a) shows frame 50 of the real image sequence runway -crossing .new obtained from 
a camera mounted on-board a landing aircraft. Features detected in this image axe shown 
in Fig. 34(b). Fig 34(c) shows the computed optical flow using our optical flow algorithm, 
and 34(d) shows the set of feature vectors identified as obstacles. 

4.5 Performance Characterization 

In this section, we evaluate the performance of our motion model recovery algorithm, as- 
suming that the image contains a single planar surface in motion. The proposed system 
for recovering the motion model parameters from a sequence of images contains two sub- 
algorithms. First, optical flow is computed using the gradient-based optical flow method, 
then these optical flow vectors are used to compute the planar motion model parameters. 
Evaluation of the performance of this motion estimation algorithm requires determining the 
performance of each of these sub- algorithms. 

Unlike the approach described in [47] for characterizing the computer vision algorithms 
in terms of the six components of protocol (i.e., modeling, annotating, estimating, vali- 
dating, propagating, and optimizing), we evaluate the algorithm in terms of the general 
function for which it is being used. For example, the optical flow algorithms are developed 
by applying the optical flow constraint equation within a local region. Thus, the ideal input 
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(c) (d) 

Figure 33: Results obtained for a simulated landing image sequence: (a) Frame 3 of image 
sequence Landing.normaL32L (b)Detected Features (c)Computed optical flow (d)Optical 
flow vectors detected as due to obstacles 
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Figure 34: Results obtained for a real image sequence: (a)Frame 60 of image sequence run- 
way.crossing.new (b) Detected Features (c) Computed optical flow (d) Optical flow vectors 
detected as due to obstacle 
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to this algorithm requires generation of data according to the following constraint equation 
within a local neighborhood. 

I x u + I y v = -I t (53) 

where 7 X , J y , and I t are the spatial and temporal gradients computed using five or more 
images. The output of the algorithm is the least square solution to the above equation within 
the local region. Hence, according to [47], model inputs are gradient values with proper 
noise models for error in gradient computation. The gradient masks for the computation 
of gradient, and the local region size within which the constraint is applied, are the tuning 
parameters of the algorithm. 

Optical flow algorithms have never been evaluated by designing a protocol as suggested 
in [47], partly because of the difficulty in creating the ideal data or proper modeling of 
the noise functions associated with the derivative computation, or perhaps in annotating 
or gathering a representative data set. The computation model underlying the algorithm 
for optical flow involves assumptions about velocity smoothness, single motion, constant 
illumination, etc., which are violated, to a large extent, in real imaging situations. Optical 
flow is the apparent motion of brightness pattern in the image and is only approximately 
equivalent to the 2-D projection of 3-D velocity. 

Various optical flow algorithms were evaluated, and quantitative measures such as flow 
density, mean, and standard deviations in error were reported in [10]. Unlike the approach 
described in [47] for characterizing the computer vision algorithms, Barron et al. [10] do not 
characterize the algorithms by selecting proper model for input, output, and perturbation in 
input and output datasets. They used a set of image sequences to compare the performance, 
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and used angular deviation between the computed and the true displacement vector as the 
performance metric. 

Our objective has been to evaluate the optical flow algorithm and motion estimation 
algorithms together, in terms of their ability to recover the motion model parameter from 
a given sequence of images. Hence, the input to this whole vision system is the sequence of 
images which are input to the optical flow algorithm. The output of this system is motion 
and structure parameters in the form of an eight-parameter vector a p and the covariance of 
the estimated parameter vector E a . To create the image sequences, we assumed that the 
camera was moving in front of a planar surface with sinusoidal and square texture. Using 
the known camera motion and plane parameter, we computed the actual (theoretical) planar 
motion model parameter vector a t and the theoretical optical flow ( u t ,vt ) at every point on 
the image. The optical flow algorithm computed the optical flow (u p , v p ) and the covariance 
of the estimated optical flow 

We evaluated our optical flow algorithm using synthetic images of planar surfaces in 
motion. Since the covariance of the estimated optical flow will be used as weighting function 
in the segmentation stage, we feel that quantitative evaluation of the optical flow should 
consider this weighting function. We use the Mahalanobis distance between the true optical 
flow and the estimated optical flow as the performance measure, which is given by 

Eu — (ttp Uf) (up u t ) (54) 

Input to the algorithm will be image sequences obtained using a camera moving in front 
of a planar surface. We used square and sinusoidal texture for the plane. A sequence of 
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15 images was obtained for different velocities of the camera. Figures 35 and 36 show 
frames 1, 5, 10, and 15 of a 15- frame image sequence simulating a camera moving with a 
velocity of (15, 0, 0) in front of a planar surface, where the camera was at a height of 100 
feet from the planar surface with an inclination of 60° with respect to the ground plane. 
These image sequences were input to the optical flow algorithm. Using the planar motion 
model and known camera geometry, a parametric form for optical flow was derived. Using 
this parametric form, the theoretical optical flow was computed at pixels where practical 
optical flow using the flow algorithm was available. The Mahalanobis distance Eu between 
the theoretical optical flow and the practical optical flow at pixel (r, y) on the image plane 
was computed. 

Figures 37(a) and 38(a) show the probability distribution of pixels n* with error E n (i) 
as a percentage of the total number of available optical flow values N. Since E u is a 
continuous function, the E u axis has been quantized at discrete intervals. Figures 37(b) 
and 38(b) show the cumulative distribution of pixels with error E u , where, at any point 
E u (i) along the X axis, the corresponding Y axis represents the total number of pixels with 
E u < E u (i). These plots show the likely ratio of good points and the goodness of those 
points in the form of Mahalanobis distance. Although the exact number of points may vary, 
the distribution remains almost identical for different image sequences. 

The second stage of our algorithm uses these optical flow vectors to recover the model 
parameter. We consider the model recovery algorithm as a black box which receives sets of 
optical flow vectors as input and outputs the model parameter that best fits the data points 
input to the unit. The optical flow vectors were already grouped into bins of decreasing 
accuracy in the earlier stage. These bins are input to the model recovery algorithm in 
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steps; at every step, a new set of data is added to the earlier set. Model recovery is done 
first, using the set of data points which are most accurate. As each new set of data points 
is added, the performance of the algorithm degrades. Let the recovered model parameter 
at each step be given by the vector d p , with covariance of the estimation given by £ a . 
Let the theoretical parametric model using which the images were created be a t . We use 
the Mahalanobis distance between the theoretical and practical model parameters as the 
performance metrics, as given by 


Ea — ( a p (dp dt ) (55) 

Figures 37(c) and 38(c) show the plot of variation of E a with E u . 

4.6 Summary 

In this section, we proposed a new motion-based segmentation algorithm for detecting 
obstacles in a sequence of images obtained from a video camera mounted on-board a landing 
aircraft. Unlike the PMMW images, images obtained using a video sensor are of high 
resolution and high quality. Features like tire marks, runway markers, etc., are clearly 
visible in these image sequences. Thus, a simple histogram-based thresholding based on 
homogeneity assumptions cannot be used. We have assumed that the runway is perfectly 
planar or is piecewise planar and used a planar motion model to group features of the 
runway. Optical flow identifying objects of nonzero height with reference to the runway 
plane - stationary or moving - violates the planar motion model; these are detected as 
obstacles. 
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(C) (d) 

Figure 35: Simulated images of planar surface in motion: (a)Frame 1 (b)Frame 5 (c)Frame 
10 (d)Frame 15 
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Figure 36: Simulated images of planar surface in motion: (a)Frame 1 (b)Frame 5 (c)Frame 
10 (d)Frame 15 
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(a) 




(c) 


Figure 37: Performance plots: (a)Optical flow density vs. E u (b)Cumulative distribution 
of flow vector vs. E u (c )E a vs E u 
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Figure 38: Performance plots: (a)Optical flow density vs. E u (b) Cumulative distribution 
of flow vector vs. E u (c )E a vs E u 
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To compute the optical flow, we proposed combining both feature-based and flow-based 
approaches in a hierarchical framework. We computed the optical flow at corner-like feature 
points, where the optical flow was expected to be more reliable, compared to regions of 
uniform intensity and edge regions. Features axe detected using the SUSAN corner detector, 
and optical flow is computed using the Maximum Likelihood Estimate based on an optical 
flow constraint equation applied to a small local region around the feature. Since the optical 
flow due to ego-motion of the camera is large, we proposed a hierarchical approach using a 
Gaussian pyramid for computing the optical flow. This makes our approach applicable to 
general situations where ego-motion and structure of the background are unknown, unlike 
warping approaches, which require prior knowledge about motion and structure to detect 
obstacles by compensating for the ego-motion. 

The motion-based segmentation algorithm proposed in this work was based on a planar 
motion model. It starts by hypothesizing that every flow vector is due to a single planar 
motion. Least square model fitting was used to compute the best model parameter for the 
set of optical flow vectors computed in the previous stage. The hypothesis was verified by 
computing the Mahalanobis distance between the computed optical flow and the estimated 
model flow. Optical flow vectors not satisfying the estimated planar motion model were 
identified as outliers and rejected. The model parameters were recomputed without the 
outliers. If the initial hypothesis failed (too many outliers), the solution was dropped and 
the image was split into four regions. The planar motion model was then applied to each 
of the regions recursively. We tested our algorithm using both synthetic and real images. 

In our proposed approach for obstacle detection by motion-based segmentation, in ad- 
dition to segmentation by hypothesizing the background to be a planar surface we also 
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computed the motion and structure parameters in the form of an eight-parameter vector 
a. Actual camera motion parameters and the plane parameter can be estimated from the 
model parameter vector; however, computation can be done only to a scale factor. This is a 
fundamental problem in any motion and structure estimation method that uses monocular 
image sequences. If either the camera motion parameter or the plane parameter is known, 
then the other parameter can be computed uniquely. For segmentation, however, estimation 
of these parameters is not required. 

Adiv’s Hough-based approach had a similar objective; however, in his method, prior 
knowledge of range of parameters was required to create the Hough space. The accuracy of 
his approach depended on the resolution of the Hough space. Segmentation using the Hough 
method is very time consuming. In general, to form a Hough space of six dimensions (Adiv 
used affine model, and used only first order terms in the flow equations) with a resolution of 
B bins in each dimension will require Q ( B e ) memory space. Voting by N features into this 
Hough space will require O {B 6 N) computer operation. To improve accuracy, resolution 
needs to be increased, resulting in exponential growth both in required memory space and 
run-time. Our algorithm requires formation ofFxP matrix (where P = 8, the dimension 
of the parameter vector a) using N features in O (N) time and computing the inverse of the 
matrix using O (-P 3 ) run-time algorithms. (Note that, though X is a 2 N x P matrix, matrix 
X T X is of dimension P x P and can be computed directly without having to form matrix 
X in only O (AT) time.) The overall running time for recovery of motion model parameter 
in one iteration is linear in number of features N . Detecting outliers in an iteration requires 
only O (Af) operations. Thus, detecting an obstacle by applying our algorithm to a group 
of N features in K iterations takes only Q ( KN ) computer operations. 
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The algorithm was able to detect an obstacle in a planar background by segmenting the 
dominant ego-motion and identifying the outliers with respect to the ego-motion. If the 
image contains too many objects, however, our algorithm might fit a single motion model to 
all the motion vectors resulting from such image sequences. Splitting the image can localize 
such motions, although too many splits could result in an insufficient number of features 
to compute the motion model parameters reliably. One approach might be to use a high 
resolution Hough space for each local region, where the range of the Hough space is decided 
by the model parameter estimated using our algorithm. 

Density of feature vectors is also critical to the success of our algorithm. If the planar 
surface is not textured, our optical flow algorithm will result in sparse flow vectors. The 
performance of the planar motion model recovery approach will degrade as the ratio of 
potential outliers (points from the obstacles as well as noisy flow vectors) to points from 
the planar surface increases. One obvious solution is to use a conventional optical flow 
algorithm, which produces dense optical flow vectors. 

We have also evaluated the optical flow algorithm and the motion model recovery method 
using least square fit. Since the objective of the approach is to recover the planar motion 
model from a sequence of images, input images are created using a known model and 
recovered model parameters are compared to the input model parameter. Barron et al. 
used angular deviation between the known optical flow and the computed optical flow 
as the performance metric. We used Mahalanobis distance between the true optical flow 
and computed optical flow as the performance measure. Optical flow vectors were sorted 
based on the Mahalanobis distance. These optical flow vectors were used to recover the 
model parameters. Model parameters thus recovered were evaluated using the Mahalanobis 
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distance between the true model parameter and the recovered model parameters. 
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5 Structure and Motion Estimation from Line Features 


In the motion-based segmentation approach described in Section 4, optical flow is computed 
at corner-like feature points. Based on the computed optical flow, feature points are sorted 
as belonging to separate planar surfaces in motion by computing the planar motion model 
in the form of an eight-parameter vector a, then measuring the individual flow vector’s 
fitness to the model. The actual motion and structure parameters can be computed from 
the model parameter vector a. The motion and structure parameters include the relative 
motion between the camera and the planar surface, and the plane parameters. In case of 
monocular image sequences, the motion and structure parameters can be computed only 
to a scale factor, because a camera moving with a velocity V in front of a planar surface 
at distance d and a camera moving with a velocity kV in front of the planar surface at 
distance kd, produces the same optical flow on the image plane, where k is a scale factor. 
However, if one of the two data are available (i.e., camera motion or plane parameter), the 
other parameter can be uniquely determined. 

In this section, we propose a method for computing 3-D position and velocity using 
line features. Most of the point features used in the previous experiments were due to 
the texture created by the tire marks, and not to true physical corners. Line features are 
due to the runway markers, and are prominent and robust. First, we propose a method 
for computing the plane parameters from line features, assuming that the line features in 
an image correspond to lines on a 3-D plane and that the camera motion is known. We 
also propose an algorithm for estimating the position and velocity of 3-D objects using line 
features, assuming the object is moving on a planar surface. Since the end points of a 
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line cannot be detected reliably, we propose an approach for matching and tracking line 
segments in image sequences. 

5.1 Estimation of Plane Parameters 

Few methods for computing motion and structure using line features have been proposed 
in the past [25, 37, 50, 68, 67, 103, 129, 133]. In general, a minimum of three views are 
required to solve motion and structure problem using line features, unlike the two frames 
required in case of point features. When the lines are on the same plane, however, two views 
are sufficient [114]. We propose a new method for computing the plane parameter from line 
features using the knowledge of the camera motion and assuming that line features result 
from the 3-D lines on a single planar surface. 

5.1.1 Analysis 

Fig. 39 shows the imaging geometry used in this analysis. The world coordinate system 
(WCS) is represented by the three mutually perpendicular axes X, Y, and Z. C and C f 
are positions of the camera at two different time instants t\ and t<i- The camera coordinate 
system (CCS) is fixed to the camera at the optical center and is represented by three axes 
X 8 , Y s , and with X s pointing in the direction of the optical axis of the camera. The 
image plane is situated at a distance /, the focal length of the camera, from the optical 
center, with axes x and y. L\ is a line on a planar surface whose equation in the WCS is 
given by AX + BY + CZ + D = 0. Let s\ and s[ be the line segments in the image plane 
corresponding to the line L\ with the camera center located at C and C* respectively. Let 
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the equations of the line segments s i and s[ in the image plane be 


ax + by + c — 0 

ax + b'y + c = 0 (56) 

The camera center C and the line segment si form a plane Pi, which intersects the plane P 2 
formed by the camera center C* and the line segment s[ at the 3D line L\. The equations 
of these projecting planes in the respective CCS are given by 

aX s + bY s + cZ s = 0 

a'X' + b'Yj + c'Z^ = 0 (57) 

Note that (a, b, c ) and (a', b', d) are the normal to the planes Pi and P 2 , respectively, 
in the respective CCS. If Rwc and R'wc aie rotation vector from WCS to the CCS at C 
and C\ then normal ni and n[ to the projection plane Pi and P 2 in the WCS is given by 

7i\ — [ ai b\ ci] Rwc b c] 

n\ = [o', b\ c;] T = iC ‘[a'&'cf (58) 

These two planes intersect to form the line L\ in the WCS. The direction vector of this line 
is given by the cross product of n\ and n\ : 


L\ = [^iCj — b^Ci, o^ci — aic[, aifej — 


nT 


(59) 
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The equation of the plane P 3 normal to L\ is: 

(b\ c[ - b[c\) X + (a[ci - aic\) Y + (ai b[ - bia[) Z + t = 0; (60) 


where t is a constant. The planes Pi, P 2 , and P 3 intersect along the line L\ whose equation 
is given by 



where (Jf;, Yi, Zi) is a point on the line Li, (Z, m, n) is the direction vector of the line, and t 
is a constant. Prom corresponding line segments s\ and s[ in two frames, we can compute 
the line segment Pi. If L\ and L 2 are two lines in the same 3-D plane with the equation 
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where i = 1, 2, then the normal to the plane is given by: 


A = m\ri2 — m2«i 
B = ^2^1 — l\ n 2 

C = liirii — hmi (63) 

This will fail if the two lines in 3-D are parallel. This can be solved by considering a line 
connecting two points (Xj, Y x . Z\) on L\ and a point (X 2 , Y 2 , Z 2 ) on L 2 . Then the 
normal to the plane is given by: 

A - mi(Z2 — Z\) — ni(Y2 — Yi) 
b = m(X 2 - X!) - h(z 2 - Zi) 

C = h{Y2-Y{)-m x {X2-X{) 

D = (X 2 - Xi)(n\Yi - miZi) + (Y 2 — Y x )(Zili — niXi) + {Z 2 - Zi)(m x Xi - l\Yi) 

(64) 


5.1.2 Feature Detection and Matching 

The above approach requires a minimum of two line correspondences in two frames. In 
this work, we used line features resulting from the runway markers, since they are long, 
easy to detect, and can be easily tracked. First, edges were detected by applying the 
Canny edge detector to the input image sequence [19]. To detect the straight lines from 
the edge image, we used the software package Object Recognition Toolkit ( ORT) developed 
by Etemadi et al. [35]. This software package takes an input edge image and reports the 
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detected straight line segments, circular arcs, and various junction between lines, such as 
the T and Y junctions. In our application, we were interested only in straight line segments. 
Each line was represented by several parameters including the two end points, mid-point, 
length, orientation, the variance in length and orientation, etc. Edge images obtained for 
frames 50 and 55 of the image sequence runway-crossing-new using the Canny edge detector 
are shown in Fig. 40(a) and Fig. 40(b), and the line segments detected by applying the ORT 
software to the edge image are shown in Fig. 40(c) and Fig. 40(d). Noisy line segments due 
to tire marks and other small line segments were later ignored by using a proper threshold 
on the length of the line segments. 

The next step was to use proper representation for each line segment and develop a 
method for matching line segments in the current frame /i to the line segments in the 
next frame $2 using the representation. It is clear from Fig. 40 that the endpoints and the 
midpoints which were commonly used in tracking line features were unreliable because the 
segments can be broken from one frame to another, and that endpoints and midpoints were 
not located in the image plane and so could not be detected. Therefore, we use the normal 
representation of line given by the equation: 

x cosQ + y sinQ = p (65) 

where p is the perpendicular distance from the origin to the line, and 6 is the angle of the line 
with respect to the x axis. This representation was used in the detection of line segments 
using the Hough transform [45] by grouping collinear points of an edge image in the (p, 6) 
parameter space. According to this representation, a straight line in ( x , y) space becomes 
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(c) (d) 

Figure 40: Canny edge image: (a)Frame 50 (b)Frame 55. Line Segments detected using 
ORT software: (c)Frame 50 (d)Frame 55. 
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a point in the (p, 8) space. Except for collinear lines, each line segment in (x,y) space 
corresponds to a unique point in the (p, 8) space. If the (p, 8) parameter space is quantized 
at N regular intervals in each of the two dimensions, then searching for a matching line 
is equivalent to searching the bin corresponding to the normal parameter (i.e., p and 8) of 
the given line. Based on the normal representation for line segments, we have developed a 
three-stage algorithm to find a best match sj in frame fe for a line segment S{ in frame f \ : 

• Compute the prediction s ? in frame / 2 , 

• Search for initial matches using Hough space, and 

• Find the best match among the initial matches. 

The position of a line segment s* in frame / 2 was predicted by assuming the position 
of every pixel along the line by computing the optical flow at those pixels. A least square 
approach was then used to fit a best line to the predicted positions. Since full optical flow 
cannot be computed at edge pixels, we computed only the normal flow (i.e., the optical 
flow in the direction of the gradient) at every pixel along the line segment. Normal flow 
computation was sufficient for this application, since we were interested only in matching 
the line segments, and not the whole line. Thus, the motion of the line in the direction of the 
line segment was not required. In addition, normal flow could be computed more accurately 
than full flow, since computation of normal flow does not need additional assumption such 
as the smoothness constraint used in the computation of full optical flow. Normal flow 
at any pixel (x, y ) in the image plane can be computed immediately from the optical flow 
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Normal flow 



Line in frame 1 


Predicted line in 
frame2 


Figure 41: Predicting the line in the next frame using normal flow 


constraint equation as: 



( 66 ) 


where I x , I y , and It are spatial and temporal gradients. Using the computed normal flow at 
every pixel along the line segment s* in frame /i, the pixel positions for the corresponding 
line s' in the next frame can be predicted as shown in Fig. 41. A best line was then fit to 
the predicted pixel positions in frame /2 to compute the predicted line segment s tp [33]. 

Predicted line Si p is only an approximation to the actual line segment s' in frame / 2 - 
However, the prediction can be used to minimize our search for the best match in frame 
/ 2 . To search for possible candidates for matching, we first map every line segment from 
frame / 2 to a N x N Hough space. Each line segment detected in frame fi votes to a 
separate bin in the two-dimensional Hough space, based on the p and 6 values of the line 
segment. Hough parameters are also computed for the predicted line segment Si p . Under 
ideal conditions, when there is no noise, the predicted line segment Sj p will be aligned with 
the actual matching line segment s' in /2 and, hence, will map to the correct bin. However, 
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due to error in the process of edge detection, line detection, normal flow computation, line 
fitting, etc., the predicted line parameter may not be exactly equal to that of the actual line 
segment found in frame fa. Hence, a search space of 3 x 3 around the predicted parameter 
in the Hough space was used to find the matching line segment as shown in Fig. 42(a). 
Sometimes multiple matches could be found within the search region. Another stage of 
filtering might be necessary to find the best match. 

If multiple matches were detected, a best match was found by computing the spatial 
proximity of the predicted line to the actual line segment, as shown in Fig. 42(b). Per- 
pendicular distances from the the endpoints of the predicted line to the each of the line 
segments matched in the Hough space were computed. If (x p ,y p ) and ( x q ,y q ) are the end- 
points of the predicted line, and a{X + fyy + c* = 0 is the equation of the ith matched line 
using the Hough space, the distance measure for the ith matched line is computed as: 

di — d p “f* dq 

_ diX p + b{y p -f- c* diXq H- biyq + Cj 

= w 7 W* 

The line segment s[ with minimum distance di is selected as the best match. 

5.1.3 Discussion 

We tested our proposed approach for computing the plane parameter for the runway us- 
ing the line features due to the runway markers and known camera motion. A real image 
sequence, runway^crossing.new , was used in this experiment. Using this sequence, we con- 
ducted two tests to explore the feasibility of the proposed approach. In the first test, we 
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p 


(a) 



Figure 42: Matching line segments: (a) Initial match: A 3 x 3 search region is used around 
the predicted parameter value in the hough space. (b)Final match: Perpendicular distance 
at the two end points are computed 
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used the line segments resulting from the runway markers; in the second test, we used the 
line features resulting from the taxiway. Although line features could be matched accu- 
rately, the computed plane parameters were not correct. Failure of the proposed approach 
can be attributed to either the degenerating case arising from the actual imaging geometry 
when the camera is moving in the direction of the 3-D line, or the inability to resolve a 
small angle between the projecting planes when the direction of motion of the camera is at 
90° to the 3-D line. 

Consider a situation where an aircraft is flying at a height of 50 ft and approaching for 
landing at an angle of 3° to the runway. In this situation, the runway markers - - which 
are in the direction of the optical axis of the camera - - are almost parallel to the camera 
motion. The projecting planes P\ and P[ due to the 3-D line L\ and its corresponding image 
segment s\ and s[ at two positions of the camera C and C f are almost aligned. Therefore, 
normals to these planes are identical, resulting in a degenerate case. 

The best line features to consider were those that were at 90° to the direction of camera 
motion. In the above landing situation, the range to the aim point is « 950/2. Without 
loss of generality, we assume a cross runway at a range of 300/2. from the current position 
of the aircraft, and the velocity of the aircraft 60 ft/ sec in the direction of motion. Our 
aim is to find the angle between the projecting planes resulting from the edge of the cross 
runway at two different camera positions. 

To make the worst case analysis, we will consider two frames that are one second apart. 
From simple trigonometry, we can show that, when the aircraft is at a height of 50/2, 
the projecting plane due to the cross runway is at an angle of « 9.5° to the runway plane 
intersecting the runway plane 300/2 from the aircraft. After 1 second the same cross runway 
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will be ss 240 ft from the aircraft, and the projecting plane due to the cross runway will be 
at an angle of « 11.1° to the runway plane. This means that the two projecting planes Pi 
and P[ 30 frames apart resulting from a 3-D line L\ at a range of 300 ft from the initial 
position of the aircraft on a planar runway are at an angle of only 1.6° to each other. It 
is quite impossible to resolve this angle with all the error in line detection, optical flow 
computation, etc. 

5.2 Estimation of 3D Position and Velocity 

Although image regions due to moving objects can be detected using the motion-based 
segmentation algorithm described in Section 3, estimating the position and velocity of such 
objects using a monocular image sequence is still an ill-posed problem. If the motion of 
the camera is known and the object is stationary, the position of the object in 3-D can 
be computed using motion stereo. The range estimation algorithms for estimating the 
distance to static objects using monocular image sequence, described in [108, 109, 110] use 
motion stereo to compute the initial range to the object. This initial value is used to start 
a Kalman filter, which recursively refined the estimate by including the new observations 
obtained by tracking the object in subsequent frames. If both the camera and the object 
are in motion, then the solution for position and velocity of the object is not unique - - even 
when the position and velocity of the camera are known accurately. Two possible solutions 
to this problem are either to apply certain constraints on the object motion ( constraint-based 
solution ) or to use stereo image sequences ( binocular stereo or trinocular stereo). 

In the constraint-based solution approach, the object is assumed to be moving on a 
known terrain (i.e., X = f(Y, Z)). Hence, the range, X , to the object is known, and the 
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object’s velocity in the direction of range is constrained by the equation: 

Vx = fy(Y 9 Z)V Y + fz(Y,Z)Vz 

where fy and fz are the partial derivatives of / (Y, Z) and Vx, Vy, and Vz are the X, Y, 
and Z components for the velocity vector. Since X and Vx are constrained, the position 
and velocity of the object in 3-D can be uniquely computed using two frames and a Kalman 
filter similar to those described in [108, 120] can be used to refine the estimate by tracking 
image features in subsequent frames. 

The stereo method uses two or more cameras and computes the range to the object 
by triangulation of corresponding image points in two frames. A Kalman filter can be 
initialized with the position and velocity estimate obtained from the initial frames. The 
estimate can be refined by considering additional measurements obtained by tracking the 
features in subsequent frames. 

In this research, we use the constraint-based solution approach. We assume that the 
object is moving on a planar surface. Given the camera’s motion and the feature locations 
in successive images, estimation of the motion parameters of the tracked object can be 
formulated as a state estimation problem using a Kalman filter. The position and velocity 
of the camera are known from the GPS and INS, and the initial estimate of the plane 
parameter is available from the motion-based segmentation algorithm described in Section 4 
or the line-based estimation approach described in Section 5.1. The Kalman filter improves 
the initial estimate by combining the redundant measurements obtained by tracking the 
features in subsequent frames. The modeling of the motion kinetics and the derivation of 
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the Kalman filter based on the model are described in the following section. 

5.2.1 Analysis 

We assume that the object is moving on a planar surface. The motion of the object in the 
3-D space is purely translational with constant velocity V = (Vx,Vy,Vz) and the equation 
of the plane is nxX + n y Y + nzZ + d = 0. The velocity, Vx, in the direction of range is 
given as: 

v x = ~ nyVV ~ mV* (68) 

n x 

The equation of the plane in the camera coordinate system is given by: 


kiX + k 2 Y + k 3 Z = l 


(69) 


Plane parameter k — [&i k 2 £3] in the CCS can be computed as: 

- Rn 

= _ T-n + d 


(70) 


where T is the translation of the camera and R is the rotation matrix of the camera at time 
t\ with respect to the origin of the WCS and is given as: 


( Tx\ 


T = 


T y 


\ 


T Z 


) 


R = 


-R11 

R 2 \ 

\ K -^31 


R\2 7 ? i 3 

R22 R-23 
R32 R 33 j 


(71) 


Let ( X , Y, Z) be the coordinates of a point P on the object at time ti in the camera 
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coordinate system (CCS). The image coordinate of the point at t] is given as: 


t Y f Z 

X = f x’ y = / x 


(72) 


The coordinate of the same point at time i 2 (t 2 = t\ + r, where r is the time interval 
between the two consecutive frames) in the CCS is given by: 


r X' 1 


r 

= 

L & I 



ni r 12 ri3 
r 21 r 22 r 23 

^31 r 32 r 33 




y 

z 


+ 


{ tx- tV'x ^ 
<y - rVy 
{ tz~ rV’ z ) 


(73) 


where \tx , is the translation of the camera in CCS at time t\. The velocity vector 

V' = [V'x, Vy, V Z ] T for the object at time fi in the CCS is given by V' = rV . The image 
coordinates of the point at time <2 is given by: 


(r 2 iX + r 22 y + r 28 Z) + (Ty — rVy) 
; (r n X + r 12 Y + r 13 Z) + (T x - rV£) 

(r3iX + r 32 y + r~33Z) + (Tg — tVz) 
1 (r n X + r n Y + r 13 Z) + (T x - rV^) 


Substituting from 69 and 72 in equation 74 we get 


Q 1 / + q 2 a: + a3y 
07 / + a 8 x + a 9 y 
04/ + 05s + 06 V 
07 / + a 8 a: + a 9 y 


(75) 
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where 


ai — ^21 + (Ty — rVy) k\, a% = r22 + (Ty — T Vy) &2> °3 = ^23 + (Ty — rVy) £3 
04 = r 31 + (Tz — tV'z ) ki, 05 = r32 + (Tz — tVz) &2, 06 = T 33 + ( Tz — tV'z ) A :3 
°7 = rn + ( Tx - tV'x ) ki, as = r 12 + (Ty — rV£) £2, 09 = **13 + (T* — tV' x ) ^3 

(76) 

Defining the state vector S = (nx ny nz Vy Vz) and the measurement vector M = 
(mi m 2 ) = ( x y), the state equation and measurement equation in discrete time system can 
be expressed as: 

$k+l = $kSk + 

Mk = m ( S k ) + v k 

= [mi (Sk ) , m 2 (Sjt)] + v k (77) 

where the state transition matrix is a Unity matrix, and w k and v k are the process 
and measurement noise respectively. Zero mean Gaussian noise is assumed, such that 
Wk ~ N(0,Qk) and vk ~ N(0,R k )- It is clear from Eq. 75 that the measurement equation 
is a nonlinear function of the state vector. An extended Kalman filter is thus necessary. 
The measurement equation is linearized about the current estimate of S giving: 

M k = H k ($* ) S k + v k (78) 
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where 



The Kalman filter consists of two parts. 


(79) 


1. Measurement Update : The measurement update is done whenever a new measurement 
is available. Prior to processing a new measurement M*, we have the estimated value 
of the state S and the covariance of the estimate P(k ), Q (k) and R(k). The new 
measurement improves our estimate of the state and its covariance. The updated 
values are 


S+ = S t -+ir*[M.-m(s t -)] 

p, f - (so) 


where the Kalman filter gain Kk is computed as 


Kk = PkHl [H k (S fc ~) P^Hl (5* ) + R k \ 


(81) 


2. Time Update: This part of the filter accounts for the system dynamics, and propa- 
gates the state and its covariance matrix until the next measurement is made. The 
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propagated values are: 


s; +l = Mf 

n+ 1 = (82) 

Starting from Eq. 81, the Kalman filter will run recursively as new measurements are 
available. The initial estimate for n is provided by the segmentation algorithm, and the 
estimate for V is computed by triangulation on the first two frames using the Eq. 75. 

5.2.2 Experimental Results 

We tested the above tracking and estimation algorithm using the real image sequence run- 
way-crossing-new. We used the line features in this experiment. Line features in the image 
are tracked using their two endpoints. The matching algorithm described in Section 5.1 
was used to match the line segments in successive frames. Separate Kalman filter is used to 
track each of the endpoint of a line. Figures 43(a) and 43(b) shows the zoomed portion of 
the frames 50 and 51 of the original image sequence containing the truck. Figures 43(c) and 
43(d) shows the edge image obtained using the Canny edge detector, and Figures 43(e) and 
43(f) shows the ORT software output, with the endpoints of the lines clearly marked on 
the image. Line segments were tracked using the matching algorithm described in section 
5.1.2. The Kalman filter did not converge to the proper value for the motion and struc- 
ture parameter. This is because line segments were broken from frame to frame, and their 
endpoints could not localized. Hence estimates done using the endpoints were not correct. 
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(e) (f) 

Figure 43: Tracking line segments: (a), (b) Zoomed portion of the original image containing 
the truck; (c), (d) canny edge image; (e), (f) ORT software output 
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5.3 Summary 


In this section, we explored the feasibility of using line features for estimating the motion 
and structure parameters from image sequences obtained from a single moving camera. 
Line features are prominent, and are expected to be more reliable than point features. Line 
features due to runway markers or other man-made textures on the runway can be used to 
estimate the plane parameter of the planar runway. However, due to a very small angle of 
separation between the projecting planes formed by the line segments and the corresponding 
runway markers, and error in localizing the image features the plane parameter could not 
be recovered reliably for this application. 

Estimating the motion of an object from a monocular image sequence obtained from a 
moving camera is a ill-conditioned problem. In this section, we propose solving this problem 
using a constraint-based solution approach, where we assume that the object is moving on a 
planar surface and that the motion of the camera is known. We have derived a Kalman filter 
for computing the plane parameter and the velocity of the object simultaneously. Either 
point or line features can be used to estimate the motion and structure. We proposed using 
line features. We tracked the line features using the endpoints. Due to difficulty in matching 
the endpoints, we were not able to get good results. 

We also described algorithms for matching line segments using normal representation. 
We follow the predict-and-match paradigm, where the line segment is first predicted by 
computing the normal flow for the line segment. Matching is done in two stages: first, 
approximate matches are found by searching the Hough space, and then a fine match is 
performed by using the minimum distance constraint. 
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6 Conclusions 


The ability to detect obstacles in a sequence of images is an essential first step in automation 
of low altitude flight, landing, takeoff, and taxiing phases of aircraft navigation. In this 
research, we explored the feasibility of computer vision algorithms for detecting obstacles 
in monocular image sequences obtained by on-board sensors of two different modalities - 
a low- resolution PMMW sensor and a conventional video sensor - for incorporation into 
an aircraft-based Synthetic Vision System (SVS). Using this information and various other 
data, SVS can complement the ground-based systems in various functions such as runway 
incursion, and obstacle detection and avoidance. 

To facilitate seeing through fog, the SVS is envisioned to be equipped with a PMMW 
sensor. Because of the low resolution and poor quality of the images obtained using these 
sensors, we proposed and implemented a model-based recognition approach for detecting 
image regions corresponding to the runway. A histogram-based thresholding approach was 
then applied within the runway region for detecting obstacles. To detect obstacles in video 
image sequences, we proposed and implemented a new motion-based segmentation algo- 
rithm. This included a novel combination of feature-based and flow-based approaches 
within a hierarchical multi-resolution frame work for computing the optical flow, where 
a gradient-based approach is applied over a small local region around corner-like features. 
These optical flow vectors are then grouped into different regions by recursively applying 
the planar motion model on image regions. 

Though the main objective of the research was obstacle detection, we also focussed on 
such other issues as estimating camera position from image-based features and improving 
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camera position information using multiple features. This is useful, since the camera posi- 
tion information available from the GPS is not accurate and the frequency interval of the 
data is not sufficient for many problem solving methods in computer vision. 

We have also explored the use of line features for estimating the motion and structure 
from monocular image sequences. We have described an algorithm for computing the plane 
parameter using only line features. Line features due to the runway markers are used. 
At least two line segments in two frames and knowledge of camera motion parameters are 
required to compute the plane parameter for the runway. A Kalman filter based recursive 
algorithm for estimating the motion of an object using point or line features contained 
within the obstacle region is described. 

6.1 Primary Contribution 

• Computation of reliable optical flow has been always a difficult task. In this work, 
we proposed a selective, gradient-based approach to be used at points where the 
computation of optical flow is expected to be more reliable. This differs from the 
conventional optical flow algorithm where, theoretically, optical flow is computed at 
every pixel. Inclusion of too much noisy optical flow would result in large errors in 
the interpretation stage. Preselecting the reliable data points in advance would not 
only improve the performance of future computational procedures, but would also 
improve the execution time of both the optical flow computation algorithm and the 
interpretation algorithm. 

• Field-based approaches fail if the optical flow to be computed is large. In many real 
life applications, such as aircraft navigation, road navigation, etc., ego- motion of the 
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sensor results in large optical flow. Although feature-based approaches can measure 
large optical flow, they are computationally very expensive due to the large search 
space. We propose using our selective, gradient-based approach within a hierarchical 
framework, and using the Gaussian pyramid to compute large ego-motion. Optical 
flow computed at lower resolution is predicted to higher resolution, and a corrective 
flow is computed and added to the original flow. 

• We proposed a new recursive motion-based segmentation algorithm which could be 
used to recover both single and multiple motion. In cases of single motion, the algo- 
rithm can tolerate larger error in the optical flow by splitting the image into smaller 
regions, then detecting the outliers within each region separately. In cases of multiple 
motions, the proposed framework hypothesizes each motion to follow a planar motion 
model and uses the split and merge approach to hypothesize and verify these regions. 

The Hough method for multiple motion segmentation requires prior knowledge of 
the range of the model parameters, and the accuracy of the method depends on the 
resolution of the Hough space. The warping procedure requires complete knowledge 
of cam era motion and the terrain in which the camera is moving. We did not make 
any assumptions about the camera motion. We only made use of planar motion model 
to hypothesize the flow vectors as belonging to one or more region, and verified the 
accuracy of the hypotheses by comparing the deviation of the computed optical flow 
from that of the hypothesized model flow. 

• Our motion-based segmentation approach for detecting multiple motions and esti- 
mating the motion and structure parameter in terms of planar surface motion model 
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parameter is computationally more efficient than the Hough-based approach. In gen- 
eral, to form a Hough space of six dimensions with a resolution of B bins in each 
t . dimension, will require O (# 6 ) memory space. Voting by N features into this Hough 

space will require Q (B^N) computer operation. In general, using our algorithm to 
segment multiple motions by applying the least square model fit approach recursively 
on a group of N features in K iterations takes only 0 ( KN ) computer operations. 

• An analytical model for computing the error in camera position estimated using image- 
based features has been developed as was a set of equations for computing the error in 
various camera parameters. A new algorithmic approach to calibrate cameras using 
image-based features is described in this report. 

• We have proposed using the Mahalanobis distance as the measure for evaluating the 
performance of the optical flow algorithm, unlike earlier approaches, which simply 
compute the density, average error, and the standard deviation to measure perfor- 
mance. Since the theoretical models used for computing the optical flow (optical flow 
constraint with local smoothness model) differ from the one used in the interpretation 
stage (planar motion model), we have proposed a new performance characterization 
which relates the error in estimated optical flow to the error in recovered parameters 
using these optical flow vectors. 

6.2 Future Research 

In this report we developed algorithms for processing images from PMMW sensors and 
video sensors separately. An interesting problem, which has applications, both in robotics 
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and navigation, is integration of information from different sensors. This can be done in 
two ways: 

• First, images from different sensors could be fused into a single composite image, then 
the fused image could be processed for detection of obstacles and other objects of 
interest. 

• First, a different class of algorithms could be used to process and extract information 
from each of the sensor images, then the extracted information could be fused. 

Such a multi-sensor system can essentially improve the resolution and thereby improve 
the performance of the segmentation and estimation algorithm. This will also avoid the 
necessity for switching between the sensors, depending on the lighting and the weather 
condition. 

In this work, we have used only monocular image sequences. It is widely known that it is 
impossible to estimate the position and velocity of a moving object obtained from a moving 
camera unless additional constraints (shape and size of the object, terrain information in 
which the object is moving, etc.) are provided. Use of stereo, however, can provide unique 
solution. Stereo image processing poses additional problems, such as stereo correspondence, 
wide enough base distance, etc. More research would be required to use the stereo images 
for this application. 

Finally, there are real time implementation issues. We have evaluated some of the 
on-the-shelf image processing hardware for implementation into our model-based object 
detection in PMMW image sequences. Computer vision and image processing problems are 
computationally intensive. There Eire two ways to speed up the computation. The first is 
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to build an algorithm-specific architecture, which would require analysis of the algorithms 
and development of data-flow architecture suitable for the application. The second method 
is to split the computation among networked work stations or on a multiprocessor system. 
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